[ 
https://issues.apache.org/jira/browse/LUCENE-1284?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12703670#action_12703670
 ] 

Felipe Sánchez Martínez commented on LUCENE-1284:
-------------------------------------------------

Hi, 

I think that the fact that the tool relies on an external free/open-source 
package to pre-process the files to be indexed should not be an obstacle for 
the community to benefit from them; the world is pretty heterogeneous ;). 
Furthermore, they are not required at search time. 

> Felipe, although Java equivalents of those command-line tools don't exist 
> currently, do you think one could implement them in Java (and release them 
> under ASL)? 

This year the Apertium project is in the Google Summer of Code. A student will 
port the ltoolbox package to Java. Note that the tool I contribute also uses 
the apertium tagger and that this tool will not be ported; fortunately the 
usage of the tagger is optional.  The Java version of lttoolbox will be 
released under the GPL license, I am not sure if they will accept to give it a 
dual license.

--
Felipe

> Set of Java classes that allow the Lucene search engine to use morphological 
> information developed for the Apertium open-source machine translation 
> platform (http://www.apertium.org)
> --------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
>
>                 Key: LUCENE-1284
>                 URL: https://issues.apache.org/jira/browse/LUCENE-1284
>             Project: Lucene - Java
>          Issue Type: New Feature
>         Environment: New feature developed under GNU/Linux, but it should 
> work in any other Java-compliance platform
>            Reporter: Felipe Sánchez Martínez
>            Assignee: Otis Gospodnetic
>         Attachments: apertium-morph.0.9.0.tgz
>
>
> Set of Java classes that allow the Lucene search engine to use morphological 
> information developed for the Apertium open-source machine translation 
> platform (http://www.apertium.org). Morphological information is used to 
> index new documents and to process smarter queries in which morphological 
> attributes can be used to specify query terms.
> The tool makes use of morphological analyzers and dictionaries developed for 
> the open-source machine translation platform Apertium (http://apertium.org) 
> and, optionally, the part-of-speech taggers developed for it. Currently there 
> are morphological dictionaries available for Spanish, Catalan, Galician, 
> Portuguese, 
> Aranese, Romanian, French and English. In addition new dictionaries are being 
> developed for Esperanto, Occitan, Basque, Swedish, Danish, 
> Welsh, Polish and Italian, among others; we hope more language pairs to be 
> added to the Apertium machine translation platform in the near future.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-dev-h...@lucene.apache.org

Reply via email to