[ https://issues.apache.org/jira/browse/LUCENE-1166?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12573508#action_12573508 ]
Otis Gospodnetic commented on LUCENE-1166: ------------------------------------------ Thomas, I think that might work for Chinese - going through the "string" of Chinese characters, one at a time, and looking up a dictionary after each additional character. One you find a dictionary match, you look at one more character. If that matches a dictionary entry, keep doing that until you keep matching dictionary entries (in order to grab the longest dictionary-matching string of characters). If the next character does not match, then the previous/last character was the end of the dictionary entry. That would work, no? As for the license info, I think you could take the approach where the required libraries are not included in the contribution in the ASF repo, but are downloaded on the fly, at build time, much like some other contributions. Could you do that? > A tokenfilter to decompose compound words > ----------------------------------------- > > Key: LUCENE-1166 > URL: https://issues.apache.org/jira/browse/LUCENE-1166 > Project: Lucene - Java > Issue Type: New Feature > Components: Analysis > Reporter: Thomas Peuss > Attachments: CompoundTokenFilter.patch, CompoundTokenFilter.patch, > CompoundTokenFilter.patch, de.xml, hyphenation.dtd > > > A tokenfilter to decompose compound words you find in many germanic languages > (like German, Swedish, ...) into single tokens. > An example: Donaudampfschiff would be decomposed to Donau, dampf, schiff so > that you can find the word even when you only enter "Schiff". > I use the hyphenation code from the Apache XML project FOP > (http://xmlgraphics.apache.org/fop/) to do the first step of decomposition. > Currently I use the FOP jars directly. I only use a handful of classes from > the FOP project. > My question now: > Would it be OK to copy this classes over to the Lucene project (renaming the > packages of course) or should I stick with the dependency to the FOP jars? > The FOP code uses the ASF V2 license as well. > What do you think? -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online. --------------------------------------------------------------------- To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]