[
https://issues.apache.org/jira/browse/LUCENE-1284?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12996413#comment-12996413
]
Kevin Brubeck Unhammer commented on LUCENE-1284:
------------------------------------------------
A little update: The Java port of lttoolbox has been complete for some time
now, and the port of apertium-tagger at least does disambiguation (training of
models is not supported yet though):
{noformat}
$ echo 'jeg' |apertium-destxt-j |lt-proc-j nb-nn.automorf.bin |
apertium-tagger-j -g nb-nn.prob -f
^jeg/jeg<prn><p1><mf><sg><nom>/jeg<n><nt><sg><ind>$^./.<sent><clb>$[][
]
{noformat}
The GsoC student Stephen Tigner is working at the moment on making sure they
are all usable as libraries; from what I understand there is just minor cleanup
work left on that.
I can't say anything on license issue though. Other than Stephen Tigner, the
most active contributor on the port is Jacob Nordfalk.
> Set of Java classes that allow the Lucene search engine to use morphological
> information developed for the Apertium open-source machine translation
> platform (http://www.apertium.org)
> --------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
>
> Key: LUCENE-1284
> URL: https://issues.apache.org/jira/browse/LUCENE-1284
> Project: Lucene - Java
> Issue Type: New Feature
> Components: contrib/analyzers
> Environment: New feature developed under GNU/Linux, but it should
> work in any other Java-compliance platform
> Reporter: Felipe Sánchez Martínez
> Assignee: Otis Gospodnetic
> Attachments: apertium-morph.0.9.0.tgz
>
>
> Set of Java classes that allow the Lucene search engine to use morphological
> information developed for the Apertium open-source machine translation
> platform (http://www.apertium.org). Morphological information is used to
> index new documents and to process smarter queries in which morphological
> attributes can be used to specify query terms.
> The tool makes use of morphological analyzers and dictionaries developed for
> the open-source machine translation platform Apertium (http://apertium.org)
> and, optionally, the part-of-speech taggers developed for it. Currently there
> are morphological dictionaries available for Spanish, Catalan, Galician,
> Portuguese,
> Aranese, Romanian, French and English. In addition new dictionaries are being
> developed for Esperanto, Occitan, Basque, Swedish, Danish,
> Welsh, Polish and Italian, among others; we hope more language pairs to be
> added to the Apertium machine translation platform in the near future.
--
This message is automatically generated by JIRA.
-
For more information on JIRA, see: http://www.atlassian.com/software/jira
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]