BTW, need to modify mediafilter.cfg file to take it effect.
Guang Huang wrote:
> In fact, Apache Jakarta POI project is already focusing on accessing 
> microsoft format files.  poi-scratchpad-3.0-alpha2  has functions to 
> parse ppt file. We already used them in our  project for parsing word, 
> ppt. excel.
>
> Here attaches sample PPTFilter.java, you could put in under 
> org.dspace.app.mediafilter folder. (need to put poi-3.2-alpha2, 
> poi-contrib-3.0-alpha2, poi-scratchpad-3.0-alpha jar files to lib 
> folder from their web site).
>
> Thanks
>
> Guang
>
> Pan Family wrote:
>> Hi,
>>
>> I submitted a MS ppt file to my collection, but filter-media
>> does not want to index this ppt file.  I tried to shut down
>> the database (PostgreSQL) and restarted it, and ran
>> filter-media several times, but it did not help.  I made
>> sure that this ppt file is indeed in the collection by openning
>> it using View/Open.
>>
>> I have no problem indexing MS Word, text, html, or pdf
>> files.  Do I need to do anything special for ppt files?
>>
>> Thanks a lot!
>>
>> -Pan
>>
>>
>> ------------------------------------------------------------------------
>>
>> ------------------------------------------------------------------------- 
>>
>> Using Tomcat but need to do more? Need to support web services, 
>> security?
>> Get stuff done quickly with pre-integrated technology to make your 
>> job easier.
>> Download IBM WebSphere Application Server v.1.0.1 based on Apache 
>> Geronimo
>> http://sel.as-us.falkag.net/sel?cmd=lnk&kid=120709&bid=263057&dat=121642
>> ------------------------------------------------------------------------
>>
>> _______________________________________________
>> DSpace-tech mailing list
>> [email protected]
>> https://lists.sourceforge.net/lists/listinfo/dspace-tech
>>   
>
> ------------------------------------------------------------------------
>
> package org.dspace.app.mediafilter;
>
> import java.io.ByteArrayInputStream;
> import java.io.InputStream;
>
> import org.apache.poi.hslf.extractor.PowerPointExtractor;
> import org.dspace.app.mediafilter.MediaFilter;
> import org.dspace.app.mediafilter.MediaFilterManager;
>
> /**
>  * Media filter for PPT file. 
>  * 
>  * @author Guang Huang
>  *
>  */
> public class PPTFilter extends MediaFilter
> {
>
>     public String getBundleName()
>     {
>         return "TEXT";
>     }
>
>     public String getDescription()
>     {
>         return "Extracted text";
>     }
>
>     public InputStream getDestinationStream(InputStream source)
>             throws Exception
>     {
>         //commented by Guang Huang
>         //?? Here don't need to close powerpoint extractor.
>         //Close input stream <code>source</code> will close powerpoint 
> extractor
>         String extractedText = new PowerPointExtractor(source).getText();
>
>         // if verbose flag is set, print out extracted text
>         // to STDOUT
>         if (MediaFilterManager.isVerbose)
>         {
>             System.out.println(extractedText);
>         }
>
>         // generate an input stream with the extracted text
>         byte[] textBytes = extractedText.getBytes();
>         ByteArrayInputStream bais = new ByteArrayInputStream(textBytes);
>
>         return bais; // will this work? or will the byte array be out of 
> scope?
>     }
>
>     public String getFilteredName(String sourceName)
>     {
>         return sourceName + ".txt";
>     }
>
>     public String getFormatString()
>     {
>         return "Text";
>     }
>
> }
>   
> ------------------------------------------------------------------------
>
> -------------------------------------------------------------------------
> Using Tomcat but need to do more? Need to support web services, security?
> Get stuff done quickly with pre-integrated technology to make your job easier.
> Download IBM WebSphere Application Server v.1.0.1 based on Apache Geronimo
> http://sel.as-us.falkag.net/sel?cmd=lnk&kid=120709&bid=263057&dat=121642
> ------------------------------------------------------------------------
>
> _______________________________________________
> DSpace-tech mailing list
> [email protected]
> https://lists.sourceforge.net/lists/listinfo/dspace-tech
>   

-------------------------------------------------------------------------
Using Tomcat but need to do more? Need to support web services, security?
Get stuff done quickly with pre-integrated technology to make your job easier.
Download IBM WebSphere Application Server v.1.0.1 based on Apache Geronimo
http://sel.as-us.falkag.net/sel?cmd=lnk&kid=120709&bid=263057&dat=121642
_______________________________________________
DSpace-tech mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/dspace-tech

Reply via email to