Thanks a bunch Tim!  Our problem was we had "Microsoft PowerPoint XML" (with a 
capital "P" on the second "p" in "Powerpoint") for both the name and the 
description in our bitstreamformatregistry table.  Once I changed both to 
"Microsoft Powerpoint XML", our .pptx files filtered successfully!

Thanks again,

Sue





Sue Walker-Thornton

(w):  (757) 864-2368

(m):  (757) 506-9903





-----Original Message-----
From: Tim Donohue [mailto:[email protected]]
Sent: Tuesday, March 13, 2012 5:36 PM
To: Thornton, Susan M. (LARC-B702)[LITES]
Cc: [email protected]; Dedmond, Nicole K. (LARC-B702)[LITES]
Subject: Re: [Dspace-tech] Are PDF-A documents filterable in DSpace?



Hi Sue,



The format registry should have two PPT entries:



PPT

---

Name: Microsoft Powerpoint

MimeType: application/vnd.ms-powerpoint

Description: Microsoft Powerpoint

Support Level: Known

File Extensions: ppt



PPTX

----

Name: Microsoft Powerpoint XML

MimeType:

application/vnd.openxmlformats-officedocument.presentationml.presentation

Description: Microsoft Powerpoint XML

Support Level: Known

File Extensions: pptx



This information can also be found in the 
'[dspace]/config/registries/bitstream-formats.xml' file:

https://fisheye3.atlassian.com/browse/dspace/dspace/trunk/dspace/config/registries/bitstream-formats.xml?hb=true#to142





- Tim



On 3/13/2012 4:26 PM, Thornton, Susan M. (LARC-B702)[LITES] wrote:

> Hi Tim,

>       Can you give me a screen shot of your definition(s) for PowerPoint in 
> the bitstream_format_registy?  Something's still not working right and I 
> suspect it may be here.

> Thanks,

> Sue

>

>

> Sue Walker-Thornton

> (w):  (757) 864-2368

> (m):  (757) 506-9903

>

>

> -----Original Message-----

> From: Tim Donohue [mailto:[email protected]]

> Sent: Tuesday, March 13, 2012 10:44 AM

> To: Thornton, Susan M. (LARC-B702)[LITES]

> Cc: 
> [email protected]<mailto:[email protected]>; 
> Dedmond, Nicole K. (LARC-B702)[LITES]

> Subject: Re: [Dspace-tech] Are PDF-A documents filterable in DSpace?

>

> Sue,

>

> A few responses inline...

>

> On 3/13/2012 9:22 AM, Thornton, Susan M. (LARC-B702)[LITES] wrote:

>> For Word docs:

>>

>> --------------

>>

>> * The rather outdated "Text-mining" tools at:

>>

>> http://code.google.com/p/text-mining/

>>

>> * Unfortunately it looks like these do NOT support docx

>>

>> * But, it looks like POI (used for PPTs, see below) does work for docx.

>> Unfortunately, this is not enabled/built out in DSpace yet. I just

>> created an issue for it at: https://jira.duraspace.org/browse/DS-1140

>>

>> *Great! Can you let us know when it's been successfully implemented?*

>

> You are welcome to subscribe to the JIRA ticket itself to receive updates. 
> Just login to JIRA (uses the same acct as the DSpace wiki), and click "Watch" 
> icon in the far right.  You'll then get an email any time that ticket is 
> updated.

>

> https://jira.duraspace.org/browse/DS-1140

>

> Currently, we need to locate a volunteer developer to take on this work.

> So, I'm not sure how long it will take before it is implemented.

>

>> For PPT:

>>

>> --------

>>

>> * POI 3.6: http://poi.apache.org/

>>

>> * This software supports pptx as well

>>

>> *How would I integrate this with DSpace version 1.7.1 to tell DSpace

>> to use POI to filter .pptx files?*

>

> This PPT/PPTX Filter was first made available in DSpace 1.7.0.  So, it should 
> already work in your DSpace 1.7.1 installation.  In your dspace.cfg you'd 
> just want to make sure the following is setup (it should be by default):

>

> 'filter.plugins' setting: make sure this includes "PowerPoint Text 
> Extractor", like displayed here:

> https://fisheye3.atlassian.com/browse/dspace/dspace/trunk/dspace/config/dspace.cfg?hb=true#to400

>

> 'FormatFilter' setting: make sure it *defines* a "PowerPoint Text Extractor, 
> like displayed here:

> https://fisheye3.atlassian.com/browse/dspace/dspace/trunk/dspace/config/dspace.cfg?hb=true#to411

>

> Finally, make sure the  "PowerPoint Text Extractor" is setup to take in two 
> input formats: "Microsoft Powerpoint, Microsoft Powerpoint XML", like 
> displayed here:

> https://fisheye3.atlassian.com/browse/dspace/dspace/trunk/dspace/config/dspace.cfg?hb=true#to419

>

> You'll then need to make sure your "Bitstream Format Registry" has a 
> definition for "Microsoft Powerpoint XML" (pptx).

>

> Again, assuming you are running on an out-of-the-box 1.7.x, all of the above 
> settings should be enabled by default. So, it should just work.

>

> - Tim

>

>

>

>

>
------------------------------------------------------------------------------
This SF email is sponsosred by:
Try Windows Azure free for 90 days Click Here 
http://p.sf.net/sfu/sfd2d-msazure
_______________________________________________
DSpace-tech mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/dspace-tech

Reply via email to