Valery, have you tried to use whitespaceTokenizer / CharTokenizer and
do any further processing in a  custom TokenFilter?!

simon

On Thu, Aug 20, 2009 at 8:48 PM, Robert Muir<rcm...@gmail.com> wrote:
> Valery, I think it all depends on how you want your search to work.
>
> when I say this, I mean for example: if a document only contains "C++"
> do you want searches on just "C" to match or not?
>
> another thing I would suggest is to take a look at the capabilities of
> Solr: it has some analysis stuff that might be beneficial for your
> needs.
> wiki page is here: http://wiki.apache.org/solr/AnalyzersTokenizersTokenFilters
>
>
> On Thu, Aug 20, 2009 at 1:46 PM, Valery<khame...@gmail.com> wrote:
>>
>> Hi Robert,
>>
>> so, would you expect a Tokenizer to consider '/' in both cases as a separate
>> Token?
>>
>> Personally, I see no problem if Tokenzer would do the following job:
>>
>> "C/C++" ==> TokenStream of { "C", "/", "C", "+", "+"}
>> and come up with "C" and "C++" tokens after processing through the
>> downstream tokenfilters.
>>
>> Similarly:
>>
>> "SAP R/3" ==> TokenStream of { "SAP", "R", "/", "3"}
>> and getting { "SAP", "R", "/", "3", "R/3", "SAP R/3"} later.
>>
>> I try to follow a spirit that a token (or its lexem) usually should never be
>> parsed again. One can build  more complex (compound) things from the tokens.
>> However, usually one never chops a lexem into smaller pieces.
>>
>> What do you think, Robert?
>>
>> regards,
>> Valery
>>
>> --
>> View this message in context: 
>> http://www.nabble.com/Any-Tokenizator-friendly-to-C%2B%2B%2C-C-%2C-.NET%2C-etc---tp25063175p25066762.html
>> Sent from the Lucene - Java Users mailing list archive at Nabble.com.
>>
>>
>> ---------------------------------------------------------------------
>> To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org
>> For additional commands, e-mail: java-user-h...@lucene.apache.org
>>
>>
>
>
>
> --
> Robert Muir
> rcm...@gmail.com
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org
> For additional commands, e-mail: java-user-h...@lucene.apache.org
>
>

---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-user-h...@lucene.apache.org

Reply via email to