[ https://issues.apache.org/jira/browse/LUCENE-1284?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12675589#action_12675589 ]
Mark Miller commented on LUCENE-1284: ------------------------------------- Hadn't seen this before. Thanks Felipe! This looks like a high quality contribution. I've expanded the attached file into contrib and built and ran the tests. Everything went smooth. I've only began to look at the code myself, but a couple initial comments: Could you remove the @author tags? The Lucene project has decided its best to leave them out (you can search the mailing list if you are interested in the discussion). How about renaming overview.html to package.html and expanding what you have there? This looks like a very useful addition, but its complicated enough to merit a more thorough overview and/or examples of how to get started. Not everyone wades into the contrib packages that often - lets hook those that do by providing a very clear: "This is what this is, this is what you can do with it, and here is how you do it". Nothing too intense, but enough to understand its usefulness quickly (and allow you to gauge the effort required for use). As an example of seemingly missing info I am wondering about: where do I get the data files? I see a link to http://www.apertium.org, but digging a bit does not immediately show me what I am looking for. Clear instructions on how to get going with your preferred morphological data files would be great (as well as clear instructions on where and how to obtain those files). Thanks for donating this code! Its something I have been interested in seeing added to Lucene for some time. - Mark > Set of Java classes that allow the Lucene search engine to use morphological > information developed for the Apertium open-source machine translation > platform (http://www.apertium.org) > -------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- > > Key: LUCENE-1284 > URL: https://issues.apache.org/jira/browse/LUCENE-1284 > Project: Lucene - Java > Issue Type: New Feature > Environment: New feature developed under GNU/Linux, but it should > work in any other Java-compliance platform > Reporter: Felipe Sánchez Martínez > Assignee: Otis Gospodnetic > Attachments: apertium-morph.2008-05-19.tgz > > > Set of Java classes that allow the Lucene search engine to use morphological > information developed for the Apertium open-source machine translation > platform (http://www.apertium.org). Morphological information is used to > index new documents and to process smarter queries in which morphological > attributes can be used to specify query terms. > The tool makes use of morphological analyzers and dictionaries developed for > the open-source machine translation platform Apertium (http://apertium.org) > and, optionally, the part-of-speech taggers developed for it. Currently there > are morphological dictionaries available for Spanish, Catalan, Galician, > Portuguese, > Aranese, Romanian, French and English. In addition new dictionaries are being > developed for Esperanto, Occitan, Basque, Swedish, Danish, > Welsh, Polish and Italian, among others; we hope more language pairs to be > added to the Apertium machine translation platform in the near future. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online. --------------------------------------------------------------------- To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org For additional commands, e-mail: java-dev-h...@lucene.apache.org