[
https://jira.duraspace.org/browse/DS-1140?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Tim Donohue updated DS-1140:
----------------------------
Status: Open (was: Received)
> Update MSWord Media Filter to use Apache POI (like PPT Filter) and also
> support .docx
> -------------------------------------------------------------------------------------
>
> Key: DS-1140
> URL: https://jira.duraspace.org/browse/DS-1140
> Project: DSpace
> Issue Type: Improvement
> Components: DSpace API
> Reporter: Tim Donohue
> Fix For: 3.0
>
>
> The Microsoft Word Media Filter (org.dspace.app.mediafilter.WordFilter) uses
> outdated, obsolete third party software, specifically the "text-mining" tools
> at: http://code.google.com/p/text-mining/
> However, there are now better options out there, especially Apache POI.
> http://poi.apache.org/text-extraction.html
> Apache POI also has the benefit of being able to extract text from docx, xls,
> xlsx and even Publisher and Visio files.
> We may even be able to create a single "MSFilter" which can just extract doc,
> docx, ppt, pptx, xls, xlsx, etc. all using POI.
> Any volunteers to implement? Looks like we should be able to implement it
> similar to the current PPT Filter
> (org.dspace.app.mediafilter.PowerPointFilter) which already uses POI. See
> also DS-714.
--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators:
https://jira.duraspace.org/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira
------------------------------------------------------------------------------
Keep Your Developer Skills Current with LearnDevNow!
The most comprehensive online learning library for Microsoft developers
is just $99.99! Visual Studio, SharePoint, SQL - plus HTML5, CSS3, MVC3,
Metro Style Apps, more. Free future releases when you subscribe now!
http://p.sf.net/sfu/learndevnow-d2d
_______________________________________________
Dspace-devel mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/dspace-devel