Sue,

A few responses inline...

On 3/13/2012 9:22 AM, Thornton, Susan M. (LARC-B702)[LITES] wrote:
> For Word docs:
>
> --------------
>
> * The rather outdated "Text-mining" tools at:
>
> http://code.google.com/p/text-mining/
>
> * Unfortunately it looks like these do NOT support docx
>
> * But, it looks like POI (used for PPTs, see below) does work for docx.
> Unfortunately, this is not enabled/built out in DSpace yet. I just
> created an issue for it at: https://jira.duraspace.org/browse/DS-1140
>
> *Great! Can you let us know when it’s been successfully implemented?*

You are welcome to subscribe to the JIRA ticket itself to receive 
updates. Just login to JIRA (uses the same acct as the DSpace wiki), and 
click "Watch" icon in the far right.  You'll then get an email any time 
that ticket is updated.

https://jira.duraspace.org/browse/DS-1140

Currently, we need to locate a volunteer developer to take on this work. 
So, I'm not sure how long it will take before it is implemented.

> For PPT:
>
> --------
>
> * POI 3.6: http://poi.apache.org/
>
> * This software supports pptx as well
>
> *How would I integrate this with DSpace version 1.7.1 to tell DSpace to
> use POI to filter .pptx files?*

This PPT/PPTX Filter was first made available in DSpace 1.7.0.  So, it 
should already work in your DSpace 1.7.1 installation.  In your 
dspace.cfg you'd just want to make sure the following is setup (it 
should be by default):

'filter.plugins' setting: make sure this includes "PowerPoint Text 
Extractor", like displayed here: 
https://fisheye3.atlassian.com/browse/dspace/dspace/trunk/dspace/config/dspace.cfg?hb=true#to400

'FormatFilter' setting: make sure it *defines* a "PowerPoint Text 
Extractor, like displayed here: 
https://fisheye3.atlassian.com/browse/dspace/dspace/trunk/dspace/config/dspace.cfg?hb=true#to411

Finally, make sure the  "PowerPoint Text Extractor" is setup to take in 
two input formats: "Microsoft Powerpoint, Microsoft Powerpoint XML", 
like displayed here: 
https://fisheye3.atlassian.com/browse/dspace/dspace/trunk/dspace/config/dspace.cfg?hb=true#to419

You'll then need to make sure your "Bitstream Format Registry" has a 
definition for "Microsoft Powerpoint XML" (pptx).

Again, assuming you are running on an out-of-the-box 1.7.x, all of the 
above settings should be enabled by default. So, it should just work.

- Tim






------------------------------------------------------------------------------
Keep Your Developer Skills Current with LearnDevNow!
The most comprehensive online learning library for Microsoft developers
is just $99.99! Visual Studio, SharePoint, SQL - plus HTML5, CSS3, MVC3,
Metro Style Apps, more. Free future releases when you subscribe now!
http://p.sf.net/sfu/learndevnow-d2d
_______________________________________________
DSpace-tech mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/dspace-tech

Reply via email to