Help with token streams and graphs

Nicolás Lichtmaier Wed, 20 Mar 2019 10:52:53 -0700

I'm trying to make synonyms work right and for that I'm trying tounderstand better graphs in a token stream.


For that purpose I've built this code:

Builder builder = CustomAnalyzer.builder(); builder.withTokenizer(StandardTokenizerFactory.class); MySynonymGraphFilterFactory.registerSynonyms(Arrays.asList( Arrays.asList("go to", "navigate", "open") )); builder.addTokenFilter(*MySynonymGraphFilterFactory*.class, "synonyms","unused");

MySynonymGraphFilterFactory is just a hack to pass a list of lists forsynonyms. It expands everything mapping everything to everything.

builder.addTokenFilter(*FlattenGraphFilterFactory*.class);/// nothing changes with this!/// Analyzer analyzer =builder.build(); TokenStream ts = analyzer.tokenStream("*",new StringReader("go to the webpage!"));

Then I call a function that just dumps terms, position increments andposition lengths:


            System.out.println(LoggingFilter.tokenStreamToString(ts));

What I don't understand is this. I get the same output whether I includeFlattenGraphFilter or not. This is the output:


   navigate<2> (0)open<2> (0)go  to  the  webpage

(angle brackets show position lengths of the preceding term; parenthesisshow position increments of the following term)

There's something I'm not understanding here. I'd thought thatflattening the stream meant that no token will have position length >1... was I wrong? I would greatly appreciate any help with understandingthis.


Thanks!

Nicolás.-

Help with token streams and graphs

Reply via email to