I believe this problem has been fixed with 0.6.1. Please give it a try. Ben Litchfield
-- On Thu, 6 Mar 2003, Eric Anderson wrote: > When it throws the exception, the indexer fails, so I cannot continue the index. > > It appears that it's only related to some files, as I have been able to remove > some of the files, and it will continue past that point, but if it encounters > one of these files, the index fails. > > Eric Anderson > LanRx Network Solutions > 815-505-6132 > > > Quoting Ben Litchfield <[EMAIL PROTECTED]>: > > > In this release I have changed how I parsed the document, which may have > > introduced this bug. I have received another report of this and will have > > it fixed for the next point release. > > > > You said you tried with reasonably sized PDF repository. Did you stop > > indexing at this error or did you continue? If you continued, is this the > > only error that you got? > > > > -Ben > > > > > > > > > > -- > > > > On Thu, 6 Mar 2003, Eric Anderson wrote: > > > > > Ben- > > > In attempting to use the PDFBox-0.6.0, I rec'd the following error when > > > attempting to scan a reasonably sized PDF repository. > > > > > > Any thoughts? > > > > > > > > > caught a class java.io.EOFException > > > with message: Unexpected end of ZLIB input stream > > > > > > > > > Eric Anderson > > > LanRx Network Solutions > > > > > > > > > Quoting Ben Litchfield <[EMAIL PROTECTED]>: > > > > > > > I would like to announce the next release of PDFBox. PDFBox allows for > > > > PDF documents to be indexed using lucene through a simple interface. > > > > Please take a look at org.pdfbox.searchengine.lucene.LucenePDFDocument, > > > > which will extract all text and PDF document summary properties as > > lucene > > > > fields. > > > > > > > > You can obtain the latest release from http://www.pdfbox.org > > > > > > > > Please send all bug reports to me and attach the PDF document when > > > > possible. > > > > > > > > RELEASE 0.6.0 > > > > -Massive improvements to memory footprint. > > > > -Must call close() on the COSDocument(LucenePDFDocument does this for > > you) > > > > -Really fixed the bug where small documents were not being indexed. > > > > -Fixed bug where no whitespace existed between obj and start of object. > > > > Exception in thread "main" java.io.IOException: expected='obj' > > > > actual='obj<</Pro > > > > -Fixed issue with spacing where textLineMatrix was not being copied > > > > properly > > > > -Fixed 'bug' where parsing would fail with some pdfs with double endobj > > > > definitions > > > > -Added PDF document summary fields to the lucene document > > > > > > > > > > > > Thank you, > > > > Ben Litchfield > > > > http://www.pdfbox.org > > > > > > > > > > > > > > > > --------------------------------------------------------------------- > > > > To unsubscribe, e-mail: [EMAIL PROTECTED] > > > > For additional commands, e-mail: [EMAIL PROTECTED] > > > > > > > > > > LanRx Network Solutions, Inc. > > > Providing Enterprise Level Solutions...On A Small Business Budget > > > > > > --------------------------------------------------------------------- > > > To unsubscribe, e-mail: [EMAIL PROTECTED] > > > For additional commands, e-mail: [EMAIL PROTECTED] > > > > > > > > > --------------------------------------------------------------------- > > To unsubscribe, e-mail: [EMAIL PROTECTED] > > For additional commands, e-mail: [EMAIL PROTECTED] > > > > LanRx Network Solutions, Inc. > Providing Enterprise Level Solutions...On A Small Business Budget > > --------------------------------------------------------------------- > To unsubscribe, e-mail: [EMAIL PROTECTED] > For additional commands, e-mail: [EMAIL PROTECTED] > --------------------------------------------------------------------- To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
