[ 
https://issues.apache.org/jira/browse/LUCENE-1284?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12675589#action_12675589
 ] 

Mark Miller commented on LUCENE-1284:
-------------------------------------

Hadn't seen this before. Thanks Felipe! This looks like a high quality 
contribution.

I've expanded the attached file into contrib and built and ran the tests. 
Everything went smooth.

I've only began to look at the code myself, but a couple initial comments:

Could you remove the @author tags? The Lucene project has decided its best to 
leave them out (you can search the mailing list if you are interested in the 
discussion).

How about renaming overview.html to package.html and expanding what you have 
there? This looks like a very useful addition, but its complicated enough to 
merit a more thorough overview and/or examples of how to get started. Not 
everyone wades into the contrib packages that often - lets hook those that do 
by providing a very clear: "This is what this is, this is what you can do with 
it, and here is how you do it". Nothing too intense, but enough to understand 
its usefulness quickly (and allow you to gauge the effort required for use).

As an example of seemingly missing info I am wondering about: where do I get 
the data files? I see a link to http://www.apertium.org, but digging a bit does 
not immediately show me what I am looking for. Clear instructions on how to get 
going with your preferred morphological data files would be great (as well as 
clear instructions on where and how to obtain those files).

Thanks for donating this code! Its something I have been interested in seeing 
added to Lucene for some time.

- Mark

> Set of Java classes that allow the Lucene search engine to use morphological 
> information developed for the Apertium open-source machine translation 
> platform (http://www.apertium.org)
> --------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
>
>                 Key: LUCENE-1284
>                 URL: https://issues.apache.org/jira/browse/LUCENE-1284
>             Project: Lucene - Java
>          Issue Type: New Feature
>         Environment: New feature developed under GNU/Linux, but it should 
> work in any other Java-compliance platform
>            Reporter: Felipe Sánchez Martínez
>            Assignee: Otis Gospodnetic
>         Attachments: apertium-morph.2008-05-19.tgz
>
>
> Set of Java classes that allow the Lucene search engine to use morphological 
> information developed for the Apertium open-source machine translation 
> platform (http://www.apertium.org). Morphological information is used to 
> index new documents and to process smarter queries in which morphological 
> attributes can be used to specify query terms.
> The tool makes use of morphological analyzers and dictionaries developed for 
> the open-source machine translation platform Apertium (http://apertium.org) 
> and, optionally, the part-of-speech taggers developed for it. Currently there 
> are morphological dictionaries available for Spanish, Catalan, Galician, 
> Portuguese, 
> Aranese, Romanian, French and English. In addition new dictionaries are being 
> developed for Esperanto, Occitan, Basque, Swedish, Danish, 
> Welsh, Polish and Italian, among others; we hope more language pairs to be 
> added to the Apertium machine translation platform in the near future.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-dev-h...@lucene.apache.org

Reply via email to