Hello,
I've found a number of posts in different places talking about how to perform
decompounding, but I haven't found too many discussing how to use the results
of decompounding. If anyone can answer this question or point me to an existing
discussion it would be very helpful.
In the description of the org.apache.lucene.analysis.compound package, it gives
the following example:
Rindfleischüberwachungsgesetz, 0, 29
Rind, 0, 4, posIncr=0
fleisch, 4, 11, posIncr=0
überwachung, 11, 22, posIncr=0
gesetz, 23, 29, posIncr=0
And I see how this allows me to find single components such as "gesetz" or
"Rind". But what if I want to find combinations of components such as
"Rindfleisch" or "überwachungsgesetz"? It seems that the pattern of using
posIncr=0 for all components excludes the possibility of finding sub-strings
that are made up of multiple components.
Any comments or thoughts would be appreciated.
Ben Douglas
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]