Hi, thanks for the patch. We'll test this as soon as possible since not beeing able to index PDF isn't a option really. PDF indexing in lucene was a key feature for our customer ;)
What's interesting is that it seems like the indexing process has some kind of "auto repair". When we tried to re-index I got the error stated above and no indexing (container nor PDF) has been done e.g. was not available through lucene search. But on the next day, when the indexing process had some time to run, pretty everything was available through search. So at least you have most of the data back in lucene... Regarding your statement about "some pdf files": We got this error like 70+ times with a average number of 400-500+ PDFs. Cheers Daniel On Mon, 6 Dec 2004 17:48:29 +0100, Khue Nguyen <[EMAIL PROTECTED]> wrote: > Hi, > > we found that with some pdf files, pdfbox can run in an infinite loop. > What you can do is : > > a) Deactivate pdf indexation by removing the pdf parser in the > WEB-INF\etc\config\fileextractor.xml config file. > > then try to reindex your jahia instance. > > or > > b) try with the attached files. If ok, we will package them with the next > jahia405 final. > > replace the PDFBox.0.6.6.jar with the patched one in WEB-INF/lib > and put PDFExtractor.class in classes/org/jahia/utils/fileparsers dir. > > Regards, > Khue Nguyen > > > > > ----- Original Message ----- > From: "Daniel Zimmermann" <[EMAIL PROTECTED]> > To: <[EMAIL PROTECTED]> > Sent: Thursday, December 02, 2004 5:07 PM > Subject: Jahia 4.0.4 --> 4.0.5 migration problems > > > Hi! > > > > We are currently in migration from Jahia 4.0.4 to 4.0.5. So far > > everything works fine... BUT ;D there are two issues. One minor and > > the other which is a bit nasty. Here they are: > > > > 1. Webapp deployment (the minor one) > > The web-apps haven't been deployed correctly. I'm possible that we did > > something wrong, but maybe someone else has had this problem before. > > It would be also interesting to know a strategy to migrate the data of > > certain webapps (e.g. forum has a date bug in 4.0.4) to > > the app logic of 4.0.5 with data from 4.0.4. > > > > 2. Search index broken and can't be re-indexed > > The search indexes are broken after migrating to 4.0.5. After > > realizing this, I hit "re-index" at the administration panel. The > > pop-up "pops up" and tells me that it's re-indexing the site. The > > problem is that it seems to work infinite. In our catalina.out there > > are multiple error messages which all tell the same: > > > > 2968239 [http-80-Processor4] INFO - Finished pdf extraction with > > PDFBox in 474ms. > > 2968239 [http-80-Processor4] INFO - Finished reading pdf Reader to > > String in 0ms. > > 2969435 [http-80-Processor4] INFO - Finished pdf extraction with > > PDFBox in 158ms. > > 2969435 [http-80-Processor4] INFO - Finished reading pdf Reader to > > String in 0ms. > > java.lang.Throwable: Warning: You did not close the PDF Document > > at org.pdfbox.cos.COSDocument.finalize()V(COSDocument.java:386) > > at java.lang.Object.runFinalizer()V(Unknown Source) > > at > > java.lang.LangAccessImpl.objectFinalize(Ljava.lang.Object;)V(Unknown > > Source) > > at java.lang.ref.Finalizer.runFinalizer()V(Unknown Source) > > at > > java.lang.ref.Finalizer.access$100(Ljava.lang.ref.Finalizer;)V(Unknown > > Source) > > at java.lang.ref.Finalizer$FinalizerThread.run()V(Unknown Source) > > > > I'm pretty sure there was a issue on this before, but i can't manage > > to find the mailing list entry again. > > > > So far we only migrated the DB content (mysql) and the filesystem > > content (/WEB-INF/var/content/*) only. So there shouldn't be any side > > effects from template customizing etc. > > > > Best regards > > Daniel Zimmermann > > >
