Hello,
For some time I have been trying to apply ShingleFilter. I have a string:
"The users get program in the User RPC API in Apache Rave"
and I would like to get:
[the users get] [users get program] [get program in] [program in
the] [in the user] [the user rpc] [user rpc api] [rpc api in] [api in
apache] [in apache rave][apache rave 0.11]
however I'm getting :
[the users get] [users] [users get program] [get] [get program in]
[program] [program in the] [in the user] [the user rpc] [user] [user
rpc api] [rpc] [rpc api in] [api] [api in apache] [in apache rave]
[apache] [apache rave 0.11] [rave]
part of my code:
protected TokenStreamComponents createComponents(String fieldName,
Reader reader){
StandardTokenizer source = new
StandardTokenizer(Version.LUCENE_43, reader);
TokenStream tokenStream = new StandardFilter(Version.LUCENE_43, source);
tokenStream = new LowerCaseFilter(Version.LUCENE_43, tokenStream);
tokenStream = new ShingleFilter(tokenStream,3,3);
tokenStream = new
StopFilter(Version.LUCENE_43,tokenStream,StopAnalyzer.ENGLISH_STOP_WORDS_SET);
return new TokenStreamComponents(source, tokenStream)
could please, somebody explain me why I'm getting single shinglers
when I set min size 3.
Thanks,
--
gosia
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]