I definitely appreciate the excitement around the "OpenOffice.org MediaFilter" (both here and at OR2007) that I've built for DSpace. The basis of this custom MediaFilter is a small script I've written which uses the OpenOffice.org API (along with a local installation of OpenOffice.org software) to automate batch format conversions.
One thing to note though: Unfortunately, OpenOffice.org software cannot yet convert Powerpoint directly to plain text (for text extraction). Strangely enough, it can convert Powerpoint to either HTML or PDF (and then you could extract the text from either of those formats for full text indexing). For those waiting on the code to be available for the OpenOffice.org MediaFilter, as mentioned at OR2007 it will be released completely open source. Unfortunately, since others outside of the DSpace community have also expressed interest, the non-DSpace-specific code will likely be released under its own open source license. This makes things a little complex, since technically UIUC "owns" this code, and I'm going to have to jump through the necessary hoops to make it freely available to all. :) I'll also be posting more background details on this work to the DSpace Wiki in the next few days (including a link to my OR2007 poster, once I have a free moment to submit it to our IR). Hopefully I can also get the code up there in the next week or two. I'll keep everyone posted! In the meantime, feel free to contact me if you want a version to "play with" :) - Tim -- ======================================== Tim Donohue Research Programmer, Illinois Digital Environment for Access to Learning and Scholarship (IDEALS) 135 Grainger Engineering Library University of Illinois at Urbana-Champaign email: [EMAIL PROTECTED] web: http://ideals.uiuc.edu phone: (217) 333-4648 fax: (217) 244-7764 ======================================== Mark Diggory wrote: > Tim Donohue had a a very elegant project/poster on attaching a > MediaFilter to an "Open Office" service at OR2007. > > -Mark > > On Feb 1, 2007, at 8:38 AM, Dorothea Salo wrote: > >> Scott Yeadon wrote: >>> Pan, >>> >>> You'll need to write your own media filter class to handle the >>> extraction of text from PowerPoint files as ppt text extraction isn't >>> currently supported by the default set of media filters. Hopefully >>> someone may have already done this and will share, but if not you'll >>> have to write your own using OpenOffice or some other means. >> >> A quick-and-dirty method might be to save the PPT as a PDF and >> ingest the PDF >> along with it. >> >> Dorothea >> >> --Dorothea Salo, Digital Repository Services Librarian >> (703)993-3742 [EMAIL PROTECTED] AIM: gmumars >> MSN 2FL, Fenwick Library >> George Mason University >> 4400 University Drive, Fairfax VA 22031 >> >> ------------------------------------------------------------------------- >> Using Tomcat but need to do more? Need to support web services, security? >> Get stuff done quickly with pre-integrated technology to make your job >> easier. >> Download IBM WebSphere Application Server v.1.0.1 based on Apache >> Geronimo >> http://sel.as-us.falkag.net/sel?cmd=lnk&kid=120709&bid=263057&dat=121642 >> _______________________________________________ >> DSpace-tech mailing list >> [email protected] >> https://lists.sourceforge.net/lists/listinfo/dspace-tech > > Mark R. Diggory > ~~~~~~~~~~~~~ > DSpace Systems Manager > MIT Libraries, Systems and Technology Services > Massachusetts Institute of Technology > > > ------------------------------------------------------------------------- Using Tomcat but need to do more? Need to support web services, security? Get stuff done quickly with pre-integrated technology to make your job easier. Download IBM WebSphere Application Server v.1.0.1 based on Apache Geronimo http://sel.as-us.falkag.net/sel?cmd=lnk&kid=120709&bid=263057&dat=121642 _______________________________________________ DSpace-tech mailing list [email protected] https://lists.sourceforge.net/lists/listinfo/dspace-tech

