suleman mubarik created LUCENE-5943:
---------------------------------------
Summary: HTML strip filter removes text between < and >
Key: LUCENE-5943
URL: https://issues.apache.org/jira/browse/LUCENE-5943
Project: Lucene - Core
Issue Type: Bug
Components: core/index
Environment: Production
Reporter: suleman mubarik
If I have this as input “I love <pizza hut> so much”
When I apply html striper it removes “pizza hut” and I get tokens "i", "love"
,"so", "much"
And these are offsets I get back ((0,1), (2,6), (20,22), (23,27))
Html strip filter should return "i", "love" ,"pizza", "hut", "so", "much"
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]