[ 
https://jira.duraspace.org/browse/DS-1140?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Tim Donohue updated DS-1140:
----------------------------

    Status: Open  (was: Received)
    
> Update MSWord Media Filter to use Apache POI (like PPT Filter) and also 
> support .docx
> -------------------------------------------------------------------------------------
>
>                 Key: DS-1140
>                 URL: https://jira.duraspace.org/browse/DS-1140
>             Project: DSpace
>          Issue Type: Improvement
>          Components: DSpace API
>            Reporter: Tim Donohue
>             Fix For: 3.0
>
>
> The Microsoft Word Media Filter (org.dspace.app.mediafilter.WordFilter) uses 
> outdated, obsolete third party software, specifically the "text-mining" tools 
> at: http://code.google.com/p/text-mining/
> However, there are now better options out there, especially Apache POI.
> http://poi.apache.org/text-extraction.html
> Apache POI also has the benefit of being able to extract text from docx, xls, 
> xlsx and even Publisher and Visio files.
> We may even be able to create a single "MSFilter" which can just extract doc, 
> docx, ppt, pptx, xls, xlsx, etc. all using POI.
> Any volunteers to implement?  Looks like we should be able to implement it 
> similar to the current PPT Filter 
> (org.dspace.app.mediafilter.PowerPointFilter) which already uses POI.  See 
> also DS-714.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://jira.duraspace.org/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

------------------------------------------------------------------------------
Keep Your Developer Skills Current with LearnDevNow!
The most comprehensive online learning library for Microsoft developers
is just $99.99! Visual Studio, SharePoint, SQL - plus HTML5, CSS3, MVC3,
Metro Style Apps, more. Free future releases when you subscribe now!
http://p.sf.net/sfu/learndevnow-d2d
_______________________________________________
Dspace-devel mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/dspace-devel

Reply via email to