hi,
I'm trying to migrate to Lucene 4.
in Lucene 3.5 I extended org.apache.lucene.analysis.FilteringTokenFilter
and overrode accept() to remove undesired shingles. in Lucene 4
org.apache.lucene.analysis.FilteringTokenFilter does not exist?
I'm trying to achieve two things:
1) remove shingles that have an empty item.
2) remove shingles when the phrase contains a comma, for example:
for the phrase: "delicious red apples, green pears, and oranges"
I want the following shingles (with a shingle size of 2):
"delicious red", "red apples", "green pears", "and oranges"
(no "apples green" because there's a comma)
(no "pears and" because there's a comma)
any ideas?
TIA
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]