Hi,

thanks for the patch. We'll test this as soon as possible since not
beeing able to index
PDF isn't a option really. PDF indexing in lucene was a key feature
for our customer ;)

What's interesting is that it seems like the indexing process has some
kind of "auto repair".
When we tried to re-index I got the error stated above and no indexing
(container nor PDF) has been done e.g. was not available through
lucene search. But on the next day, when the
indexing process had some time to run, pretty everything was available
through search. So at least you have most of the data back in
lucene...

Regarding your statement about "some pdf files": We got this error
like 70+ times with a
average number of 400-500+ PDFs. 

Cheers
Daniel

On Mon, 6 Dec 2004 17:48:29 +0100, Khue Nguyen <[EMAIL PROTECTED]> wrote:
> Hi,
> 
> we found that with some pdf files, pdfbox can run in an infinite loop.
> What you can do is :
> 
> a) Deactivate pdf indexation by removing the pdf parser in the
> WEB-INF\etc\config\fileextractor.xml config file.
> 
> then try to reindex your jahia instance.
> 
> or
> 
> b) try with the attached files. If ok, we will package them with the next
> jahia405 final.
> 
> replace the PDFBox.0.6.6.jar with the patched one in WEB-INF/lib
> and put PDFExtractor.class in classes/org/jahia/utils/fileparsers dir.
> 
> Regards,
> Khue Nguyen
> 
> 
> 
> 
> ----- Original Message -----
> From: "Daniel Zimmermann" <[EMAIL PROTECTED]>
> To: <[EMAIL PROTECTED]>
> Sent: Thursday, December 02, 2004 5:07 PM
> Subject: Jahia 4.0.4 --> 4.0.5 migration problems
> 
> > Hi!
> >
> > We are currently in migration from Jahia 4.0.4 to 4.0.5. So far
> > everything works fine... BUT ;D there are two issues. One minor and
> > the other which is a bit nasty. Here they are:
> >
> > 1. Webapp deployment (the minor one)
> > The web-apps haven't been deployed correctly. I'm possible that we did
> > something wrong, but maybe someone else has had this problem before.
> > It would be also interesting to know a strategy to migrate the data of
> > certain webapps (e.g. forum has a date bug in 4.0.4) to
> > the app logic of 4.0.5 with data from 4.0.4.
> >
> > 2. Search index broken and can't be re-indexed
> > The search indexes are broken after migrating to 4.0.5. After
> > realizing this, I hit "re-index" at the administration panel. The
> > pop-up "pops up" and tells me that it's re-indexing the site. The
> > problem is that it seems to work infinite. In our catalina.out there
> > are multiple error messages which all tell the same:
> >
> > 2968239 [http-80-Processor4]  INFO - Finished pdf extraction with
> > PDFBox in 474ms.
> > 2968239 [http-80-Processor4]  INFO - Finished reading pdf Reader to
> > String in 0ms.
> > 2969435 [http-80-Processor4]  INFO - Finished pdf extraction with
> > PDFBox in 158ms.
> > 2969435 [http-80-Processor4]  INFO - Finished reading pdf Reader to
> > String in 0ms.
> > java.lang.Throwable: Warning: You did not close the PDF Document
> >        at org.pdfbox.cos.COSDocument.finalize()V(COSDocument.java:386)
> >        at java.lang.Object.runFinalizer()V(Unknown Source)
> >        at
> > java.lang.LangAccessImpl.objectFinalize(Ljava.lang.Object;)V(Unknown
> > Source)
> >        at java.lang.ref.Finalizer.runFinalizer()V(Unknown Source)
> >        at
> > java.lang.ref.Finalizer.access$100(Ljava.lang.ref.Finalizer;)V(Unknown
> > Source)
> >        at java.lang.ref.Finalizer$FinalizerThread.run()V(Unknown Source)
> >
> > I'm pretty sure there was a issue on this before, but i can't manage
> > to find the mailing list entry again.
> >
> > So far we only migrated the DB content (mysql) and the filesystem
> > content (/WEB-INF/var/content/*) only. So there shouldn't be any side
> > effects from template customizing etc.
> >
> > Best regards
> > Daniel Zimmermann
> 
> 
>

Reply via email to