BTW, need to modify mediafilter.cfg file to take it effect. Guang Huang wrote: > In fact, Apache Jakarta POI project is already focusing on accessing > microsoft format files. poi-scratchpad-3.0-alpha2 has functions to > parse ppt file. We already used them in our project for parsing word, > ppt. excel. > > Here attaches sample PPTFilter.java, you could put in under > org.dspace.app.mediafilter folder. (need to put poi-3.2-alpha2, > poi-contrib-3.0-alpha2, poi-scratchpad-3.0-alpha jar files to lib > folder from their web site). > > Thanks > > Guang > > Pan Family wrote: >> Hi, >> >> I submitted a MS ppt file to my collection, but filter-media >> does not want to index this ppt file. I tried to shut down >> the database (PostgreSQL) and restarted it, and ran >> filter-media several times, but it did not help. I made >> sure that this ppt file is indeed in the collection by openning >> it using View/Open. >> >> I have no problem indexing MS Word, text, html, or pdf >> files. Do I need to do anything special for ppt files? >> >> Thanks a lot! >> >> -Pan >> >> >> ------------------------------------------------------------------------ >> >> ------------------------------------------------------------------------- >> >> Using Tomcat but need to do more? Need to support web services, >> security? >> Get stuff done quickly with pre-integrated technology to make your >> job easier. >> Download IBM WebSphere Application Server v.1.0.1 based on Apache >> Geronimo >> http://sel.as-us.falkag.net/sel?cmd=lnk&kid=120709&bid=263057&dat=121642 >> ------------------------------------------------------------------------ >> >> _______________________________________________ >> DSpace-tech mailing list >> [email protected] >> https://lists.sourceforge.net/lists/listinfo/dspace-tech >> > > ------------------------------------------------------------------------ > > package org.dspace.app.mediafilter; > > import java.io.ByteArrayInputStream; > import java.io.InputStream; > > import org.apache.poi.hslf.extractor.PowerPointExtractor; > import org.dspace.app.mediafilter.MediaFilter; > import org.dspace.app.mediafilter.MediaFilterManager; > > /** > * Media filter for PPT file. > * > * @author Guang Huang > * > */ > public class PPTFilter extends MediaFilter > { > > public String getBundleName() > { > return "TEXT"; > } > > public String getDescription() > { > return "Extracted text"; > } > > public InputStream getDestinationStream(InputStream source) > throws Exception > { > //commented by Guang Huang > //?? Here don't need to close powerpoint extractor. > //Close input stream <code>source</code> will close powerpoint > extractor > String extractedText = new PowerPointExtractor(source).getText(); > > // if verbose flag is set, print out extracted text > // to STDOUT > if (MediaFilterManager.isVerbose) > { > System.out.println(extractedText); > } > > // generate an input stream with the extracted text > byte[] textBytes = extractedText.getBytes(); > ByteArrayInputStream bais = new ByteArrayInputStream(textBytes); > > return bais; // will this work? or will the byte array be out of > scope? > } > > public String getFilteredName(String sourceName) > { > return sourceName + ".txt"; > } > > public String getFormatString() > { > return "Text"; > } > > } > > ------------------------------------------------------------------------ > > ------------------------------------------------------------------------- > Using Tomcat but need to do more? Need to support web services, security? > Get stuff done quickly with pre-integrated technology to make your job easier. > Download IBM WebSphere Application Server v.1.0.1 based on Apache Geronimo > http://sel.as-us.falkag.net/sel?cmd=lnk&kid=120709&bid=263057&dat=121642 > ------------------------------------------------------------------------ > > _______________________________________________ > DSpace-tech mailing list > [email protected] > https://lists.sourceforge.net/lists/listinfo/dspace-tech >
------------------------------------------------------------------------- Using Tomcat but need to do more? Need to support web services, security? Get stuff done quickly with pre-integrated technology to make your job easier. Download IBM WebSphere Application Server v.1.0.1 based on Apache Geronimo http://sel.as-us.falkag.net/sel?cmd=lnk&kid=120709&bid=263057&dat=121642 _______________________________________________ DSpace-tech mailing list [email protected] https://lists.sourceforge.net/lists/listinfo/dspace-tech

