Hi Susan, We are also running 1.4.2 and have made some local mods to MediaFilterManager.java to work around the errors that occur in filter-media. Our code basically writes the bitstream_id of the documents that cannot be filtered, whether it's due to invalid characters, non-OCR'd documents, or the Java heap space error, to a locally-created table. Once a bitstream_id is written to that table, filter-media will not attempt to filter it again, and skips it. A report is created from the entries in the local table and is sent to the Users for research and possible rescanning. We have identified a lot of documents this way that were scanned, but not OCR'd. We've also identified quite a few documents that corrupted and could not even be opened in DSpace.
I'd be happy to send you our code if you're interested. Happy Holidays, Sue Walker-Thornton Sue Walker-Thornton ConITS Contract NASA Langley Research Center Integrated Library Systems Application & Database Administrator 130 Research Drive Hampton, VA 23666 Office: (757) 224-4074 Fax: (757) 224-4001 Pager: (757) 988-2547 Email: susan.m.thorn...@nasa.gov -----Original Message----- From: susan rector [mailto:setea...@vcu.edu] Sent: Monday, December 22, 2008 8:15 AM To: George Stanley Kozak Cc: dspace-tech@lists.sourceforge.net Subject: Re: [Dspace-tech] Java heap space error Thanks George, Hmmm. I'm looking at our logs now. I would love to see the configs if you would forward. I've had trouble with the filter-media program, too and recently took it out of cron... Thanks again, Susan George Stanley Kozak wrote: > Susan: > > Yes, I have been having the same problem. I narrowed it down to what I > think is a malicious harvester. I have blocked them temporarily and I > think everything is OK (at least for the last 3 days). > > I made some changes to my Tomcat and PostgreSQL config files as well to > allocate more memory and Mark Diggory suggested using mod_cband > to throttle back aggressive clients. > > Susan Thornton of NASA said that at her site this is caused by the > filter-media program. > > If you'd like to see what my configs are set to, I'll be happy to share that. > > >> Happy holidays all, >> >> Need some advice. >> >> My dspace app is crashing every other day, with a memory leak error >> popping up in the Tomcat logs: >> >> Dec 21, 2008 3:17:46 AM >> org.apache.tomcat.util.threads.ThreadPool$ControlRunnable run >> SEVERE: Caught exception (java.lang.OutOfMemoryError: Java heap space) >> executing org.apache.jk.common.channelsocket$socketconnect...@7f84c9, >> terminating thread >> >> I'm still running Dspace 1.4.2 on RedHat Linux 5. Postgres is my database. >> >> Anyone else been through this with the Dspace app? >> >> Thanks, >> >> Susan Teague Rector >> VCU Libraries >> setea...@vcu.edu >> >> ------------------------------------------------------------------------ ------ >> _______________________________________________ >> DSpace-tech mailing list >> DSpace-tech@lists.sourceforge.net >> https://lists.sourceforge.net/lists/listinfo/dspace-tech >> >> > > > **************************************** > George Kozak > Coordinator > Web Development and Management > Digital Media Group > 501 Olin Library > Cornell University 14853 > g...@cornell.edu > 607-255-8924 > > ------------------------------------------------------------------------ ------ _______________________________________________ DSpace-tech mailing list DSpace-tech@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/dspace-tech ------------------------------------------------------------------------------ _______________________________________________ DSpace-tech mailing list DSpace-tech@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/dspace-tech