I'm interested to this analyzer.. it had escaped me and solves an old problem!
Could you report about its usage: - did you have to feed words in a dictionary? - does anyone have user-measures already?... and the last question for the research fun: is there any approach towards preferring Überwachunggesetz as a concept than, say, Fleischüberwachung? (again, that could be based on a dictionary probably).
thanks in advance paul Le 21-oct.-09 à 04:00, Robert Muir a écrit :
hi, it will work because it will also decompound "Rindfleish" into Rind andfleish, with posIncr=0so if you index Rindfleischüberwachungsgesetz, then query with "Rindfleish", its matching because Rindfleish also gets decompounded into Rind and fleish.On Tue, Oct 20, 2009 at 8:35 PM, Benjamin Douglas <bbdoug...@basistech.com>wrote:Hello, I've found a number of posts in different places talking about how toperform decompounding, but I haven't found too many discussing how to use the results of decompounding. If anyone can answer this question or point meto an existing discussion it would be very helpful.In the description of the org.apache.lucene.analysis.compound package, itgives the following example: Rindfleischüberwachungsgesetz, 0, 29 Rind, 0, 4, posIncr=0 fleisch, 4, 11, posIncr=0 überwachung, 11, 22, posIncr=0 gesetz, 23, 29, posIncr=0And I see how this allows me to find single components such as "gesetz" or"Rind". But what if I want to find combinations of components such as"Rindfleisch" or "überwachungsgesetz"? It seems that the pattern of using posIncr=0 for all components excludes the possibility of finding sub-stringsthat are made up of multiple components. Any comments or thoughts would be appreciated. Ben Douglas --------------------------------------------------------------------- To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org For additional commands, e-mail: java-user-h...@lucene.apache.org-- Robert Muir rcm...@gmail.com
smime.p7s
Description: S/MIME cryptographic signature