Thanks a bunch Tim! Our problem was we had "Microsoft PowerPoint XML" (with a
capital "P" on the second "p" in "Powerpoint") for both the name and the
description in our bitstreamformatregistry table. Once I changed both to
"Microsoft Powerpoint XML", our .pptx files filtered successfully!
Thanks again,
Sue
Sue Walker-Thornton
(w): (757) 864-2368
(m): (757) 506-9903
-----Original Message-----
From: Tim Donohue [mailto:[email protected]]
Sent: Tuesday, March 13, 2012 5:36 PM
To: Thornton, Susan M. (LARC-B702)[LITES]
Cc: [email protected]; Dedmond, Nicole K. (LARC-B702)[LITES]
Subject: Re: [Dspace-tech] Are PDF-A documents filterable in DSpace?
Hi Sue,
The format registry should have two PPT entries:
PPT
---
Name: Microsoft Powerpoint
MimeType: application/vnd.ms-powerpoint
Description: Microsoft Powerpoint
Support Level: Known
File Extensions: ppt
PPTX
----
Name: Microsoft Powerpoint XML
MimeType:
application/vnd.openxmlformats-officedocument.presentationml.presentation
Description: Microsoft Powerpoint XML
Support Level: Known
File Extensions: pptx
This information can also be found in the
'[dspace]/config/registries/bitstream-formats.xml' file:
https://fisheye3.atlassian.com/browse/dspace/dspace/trunk/dspace/config/registries/bitstream-formats.xml?hb=true#to142
- Tim
On 3/13/2012 4:26 PM, Thornton, Susan M. (LARC-B702)[LITES] wrote:
> Hi Tim,
> Can you give me a screen shot of your definition(s) for PowerPoint in
> the bitstream_format_registy? Something's still not working right and I
> suspect it may be here.
> Thanks,
> Sue
>
>
> Sue Walker-Thornton
> (w): (757) 864-2368
> (m): (757) 506-9903
>
>
> -----Original Message-----
> From: Tim Donohue [mailto:[email protected]]
> Sent: Tuesday, March 13, 2012 10:44 AM
> To: Thornton, Susan M. (LARC-B702)[LITES]
> Cc:
> [email protected]<mailto:[email protected]>;
> Dedmond, Nicole K. (LARC-B702)[LITES]
> Subject: Re: [Dspace-tech] Are PDF-A documents filterable in DSpace?
>
> Sue,
>
> A few responses inline...
>
> On 3/13/2012 9:22 AM, Thornton, Susan M. (LARC-B702)[LITES] wrote:
>> For Word docs:
>>
>> --------------
>>
>> * The rather outdated "Text-mining" tools at:
>>
>> http://code.google.com/p/text-mining/
>>
>> * Unfortunately it looks like these do NOT support docx
>>
>> * But, it looks like POI (used for PPTs, see below) does work for docx.
>> Unfortunately, this is not enabled/built out in DSpace yet. I just
>> created an issue for it at: https://jira.duraspace.org/browse/DS-1140
>>
>> *Great! Can you let us know when it's been successfully implemented?*
>
> You are welcome to subscribe to the JIRA ticket itself to receive updates.
> Just login to JIRA (uses the same acct as the DSpace wiki), and click "Watch"
> icon in the far right. You'll then get an email any time that ticket is
> updated.
>
> https://jira.duraspace.org/browse/DS-1140
>
> Currently, we need to locate a volunteer developer to take on this work.
> So, I'm not sure how long it will take before it is implemented.
>
>> For PPT:
>>
>> --------
>>
>> * POI 3.6: http://poi.apache.org/
>>
>> * This software supports pptx as well
>>
>> *How would I integrate this with DSpace version 1.7.1 to tell DSpace
>> to use POI to filter .pptx files?*
>
> This PPT/PPTX Filter was first made available in DSpace 1.7.0. So, it should
> already work in your DSpace 1.7.1 installation. In your dspace.cfg you'd
> just want to make sure the following is setup (it should be by default):
>
> 'filter.plugins' setting: make sure this includes "PowerPoint Text
> Extractor", like displayed here:
> https://fisheye3.atlassian.com/browse/dspace/dspace/trunk/dspace/config/dspace.cfg?hb=true#to400
>
> 'FormatFilter' setting: make sure it *defines* a "PowerPoint Text Extractor,
> like displayed here:
> https://fisheye3.atlassian.com/browse/dspace/dspace/trunk/dspace/config/dspace.cfg?hb=true#to411
>
> Finally, make sure the "PowerPoint Text Extractor" is setup to take in two
> input formats: "Microsoft Powerpoint, Microsoft Powerpoint XML", like
> displayed here:
> https://fisheye3.atlassian.com/browse/dspace/dspace/trunk/dspace/config/dspace.cfg?hb=true#to419
>
> You'll then need to make sure your "Bitstream Format Registry" has a
> definition for "Microsoft Powerpoint XML" (pptx).
>
> Again, assuming you are running on an out-of-the-box 1.7.x, all of the above
> settings should be enabled by default. So, it should just work.
>
> - Tim
>
>
>
>
>
------------------------------------------------------------------------------
This SF email is sponsosred by:
Try Windows Azure free for 90 days Click Here
http://p.sf.net/sfu/sfd2d-msazure
_______________________________________________
DSpace-tech mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/dspace-tech