Hi Tim,
Can you give me a screen shot of your definition(s) for PowerPoint in the
bitstream_format_registy? Something's still not working right and I suspect it
may be here.
Thanks,
Sue
Sue Walker-Thornton
(w): (757) 864-2368
(m): (757) 506-9903
-----Original Message-----
From: Tim Donohue [mailto:[email protected]]
Sent: Tuesday, March 13, 2012 10:44 AM
To: Thornton, Susan M. (LARC-B702)[LITES]
Cc: [email protected]; Dedmond, Nicole K. (LARC-B702)[LITES]
Subject: Re: [Dspace-tech] Are PDF-A documents filterable in DSpace?
Sue,
A few responses inline...
On 3/13/2012 9:22 AM, Thornton, Susan M. (LARC-B702)[LITES] wrote:
> For Word docs:
>
> --------------
>
> * The rather outdated "Text-mining" tools at:
>
> http://code.google.com/p/text-mining/
>
> * Unfortunately it looks like these do NOT support docx
>
> * But, it looks like POI (used for PPTs, see below) does work for docx.
> Unfortunately, this is not enabled/built out in DSpace yet. I just
> created an issue for it at: https://jira.duraspace.org/browse/DS-1140
>
> *Great! Can you let us know when it's been successfully implemented?*
You are welcome to subscribe to the JIRA ticket itself to receive updates. Just
login to JIRA (uses the same acct as the DSpace wiki), and click "Watch" icon
in the far right. You'll then get an email any time that ticket is updated.
https://jira.duraspace.org/browse/DS-1140
Currently, we need to locate a volunteer developer to take on this work.
So, I'm not sure how long it will take before it is implemented.
> For PPT:
>
> --------
>
> * POI 3.6: http://poi.apache.org/
>
> * This software supports pptx as well
>
> *How would I integrate this with DSpace version 1.7.1 to tell DSpace
> to use POI to filter .pptx files?*
This PPT/PPTX Filter was first made available in DSpace 1.7.0. So, it should
already work in your DSpace 1.7.1 installation. In your dspace.cfg you'd just
want to make sure the following is setup (it should be by default):
'filter.plugins' setting: make sure this includes "PowerPoint Text Extractor",
like displayed here:
https://fisheye3.atlassian.com/browse/dspace/dspace/trunk/dspace/config/dspace.cfg?hb=true#to400
'FormatFilter' setting: make sure it *defines* a "PowerPoint Text Extractor,
like displayed here:
https://fisheye3.atlassian.com/browse/dspace/dspace/trunk/dspace/config/dspace.cfg?hb=true#to411
Finally, make sure the "PowerPoint Text Extractor" is setup to take in two
input formats: "Microsoft Powerpoint, Microsoft Powerpoint XML", like displayed
here:
https://fisheye3.atlassian.com/browse/dspace/dspace/trunk/dspace/config/dspace.cfg?hb=true#to419
You'll then need to make sure your "Bitstream Format Registry" has a definition
for "Microsoft Powerpoint XML" (pptx).
Again, assuming you are running on an out-of-the-box 1.7.x, all of the above
settings should be enabled by default. So, it should just work.
- Tim
------------------------------------------------------------------------------
Keep Your Developer Skills Current with LearnDevNow!
The most comprehensive online learning library for Microsoft developers
is just $99.99! Visual Studio, SharePoint, SQL - plus HTML5, CSS3, MVC3,
Metro Style Apps, more. Free future releases when you subscribe now!
http://p.sf.net/sfu/learndevnow-d2d
_______________________________________________
DSpace-tech mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/dspace-tech