[ https://issues.apache.org/jira/browse/LUCENE-1284?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12703670#action_12703670 ]
Felipe Sánchez Martínez commented on LUCENE-1284: ------------------------------------------------- Hi, I think that the fact that the tool relies on an external free/open-source package to pre-process the files to be indexed should not be an obstacle for the community to benefit from them; the world is pretty heterogeneous ;). Furthermore, they are not required at search time. > Felipe, although Java equivalents of those command-line tools don't exist > currently, do you think one could implement them in Java (and release them > under ASL)? This year the Apertium project is in the Google Summer of Code. A student will port the ltoolbox package to Java. Note that the tool I contribute also uses the apertium tagger and that this tool will not be ported; fortunately the usage of the tagger is optional. The Java version of lttoolbox will be released under the GPL license, I am not sure if they will accept to give it a dual license. -- Felipe > Set of Java classes that allow the Lucene search engine to use morphological > information developed for the Apertium open-source machine translation > platform (http://www.apertium.org) > -------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- > > Key: LUCENE-1284 > URL: https://issues.apache.org/jira/browse/LUCENE-1284 > Project: Lucene - Java > Issue Type: New Feature > Environment: New feature developed under GNU/Linux, but it should > work in any other Java-compliance platform > Reporter: Felipe Sánchez Martínez > Assignee: Otis Gospodnetic > Attachments: apertium-morph.0.9.0.tgz > > > Set of Java classes that allow the Lucene search engine to use morphological > information developed for the Apertium open-source machine translation > platform (http://www.apertium.org). Morphological information is used to > index new documents and to process smarter queries in which morphological > attributes can be used to specify query terms. > The tool makes use of morphological analyzers and dictionaries developed for > the open-source machine translation platform Apertium (http://apertium.org) > and, optionally, the part-of-speech taggers developed for it. Currently there > are morphological dictionaries available for Spanish, Catalan, Galician, > Portuguese, > Aranese, Romanian, French and English. In addition new dictionaries are being > developed for Esperanto, Occitan, Basque, Swedish, Danish, > Welsh, Polish and Italian, among others; we hope more language pairs to be > added to the Apertium machine translation platform in the near future. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online. --------------------------------------------------------------------- To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org For additional commands, e-mail: java-dev-h...@lucene.apache.org