[ http://issues.apache.org/jira/browse/NUTCH-125?page=comments#action_12366063 ]
ilango gurusamy commented on NUTCH-125: --------------------------------------- Hi Andrezj I want to try out this plugin. What areas do you think this parser would need improvement or further additions, if any? I would be glad to help thanks ilango > OpenOffice Parser plugin > ------------------------ > > Key: NUTCH-125 > URL: http://issues.apache.org/jira/browse/NUTCH-125 > Project: Nutch > Type: New Feature > Reporter: Andrzej Bialecki > Assignee: Andrzej Bialecki > Attachments: parse-oo.zip > > A simple parser for StarOffice SXW and OpenDocument ODT files. This plugin > does not use the UNO bridge in OpenOffice , but rather uses standard > ZipInputStream, and parses content.xml and meta.xml inside these files to > extract metadata and plain text. > This plugin uses dom4j, because of easy XPath node selection, but this > dependency could be removed. -- This message is automatically generated by JIRA. - If you think it was sent incorrectly contact one of the administrators: http://issues.apache.org/jira/secure/Administrators.jspa - For more information on JIRA, see: http://www.atlassian.com/software/jira
