I considered it, and it's definitely an option.
but I read in the book "Lucene In Action" that MappingCharFilter is
inefficient and I'm not sure that I need that. if implementing my own
involves a lot of coding then I might resort to it as I don't have large
data sets to index at this time.
thanks for your answer,
Igal
On 11/3/2012 4:42 PM, Robert Muir wrote:
On Sat, Nov 3, 2012 at 7:35 PM, Igal @ getRailo.org <i...@getrailo.org> wrote:
hi,
I want to make sure that every comma (,) and semi-colon (;) is followed by a
space prior to tokenizing.
the idea is to then use a WhitespaceTokenizer which will keep commas but
still split the phrase in a case like:
"I bought red apples,green pears,and yellow oranges"
I'm thinking of extending CharFilter to "inject" a space after the comma.
my questions are:
1) does it make sense or am I completely off here?
2) are there any code examples of CharFilter implementations with
injection of a char?
Can't you just use something like MappingCharFilter with a single
mapping of "," to ", " ?
---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-user-h...@lucene.apache.org
---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-user-h...@lucene.apache.org