Hi,
we're now catching all throwables and close the document in a finally statement. But the fact that PdfBox still fails with some Pdf is another issue and we're still looking for a better parsing solution.
Thanks, Khue Nguyen
----- Original Message ----- From: "Daniel Zimmermann" <[EMAIL PROTECTED]>
To: <[EMAIL PROTECTED]>
Sent: Wednesday, December 08, 2004 10:24 AM
Subject: Re: Jahia 4.0.4 --> 4.0.5 migration problems
Hello Khue,
the patch works better than the version provided it 4.0.5PR. But we
still get those nasty
"You did not close the PDF Document" error from PDFBox. Not that
often, but stilI often. I did some search and found the following
discussion: http://sourceforge.net/tracker/index.php?func=detail&aid=875161&group_id=78314&atid=552832
Seems like PDDocuments need to be closed with parser.getDocument().close() or the exception is thrown when the COSDocument is finalized. A side effect is that temporary data remains on the filesystem and is never deleted.
Another track to a solution could be this discussion: http://www.textmining.org/modules.php?op=modload&name=News&file=article&sid=7&POSTNUKESID=965198834c64701a28bcf3a16f1b2a16 It's a Excel Fileparser someone wrote on based on POI. They also provide a solution to get rid of those error messages if a exception occurs.
It was just a bit of googling and maybe you know that already and/or I'm completely on the wrong track.... but maybe that just solves the problem? ;D
Cheers Daniel Zimmermann
On Mon, 6 Dec 2004 17:48:29 +0100, Khue Nguyen <[EMAIL PROTECTED]> wrote:Hi,
we found that with some pdf files, pdfbox can run in an infinite loop. What you can do is :
a) Deactivate pdf indexation by removing the pdf parser in the WEB-INF\etc\config\fileextractor.xml config file.
then try to reindex your jahia instance.
or
b) try with the attached files. If ok, we will package them with the next jahia405 final.
replace the PDFBox.0.6.6.jar with the patched one in WEB-INF/lib and put PDFExtractor.class in classes/org/jahia/utils/fileparsers dir.
Regards, Khue Nguyen
----- Original Message ----- From: "Daniel Zimmermann" <[EMAIL PROTECTED]> To: <[EMAIL PROTECTED]> Sent: Thursday, December 02, 2004 5:07 PM Subject: Jahia 4.0.4 --> 4.0.5 migration problems
> Hi!
>
> We are currently in migration from Jahia 4.0.4 to 4.0.5. So far
> everything works fine... BUT ;D there are two issues. One minor and
> the other which is a bit nasty. Here they are:
>
> 1. Webapp deployment (the minor one)
> The web-apps haven't been deployed correctly. I'm possible that we did
> something wrong, but maybe someone else has had this problem before.
> It would be also interesting to know a strategy to migrate the data of
> certain webapps (e.g. forum has a date bug in 4.0.4) to
> the app logic of 4.0.5 with data from 4.0.4.
>
> 2. Search index broken and can't be re-indexed
> The search indexes are broken after migrating to 4.0.5. After
> realizing this, I hit "re-index" at the administration panel. The
> pop-up "pops up" and tells me that it's re-indexing the site. The
> problem is that it seems to work infinite. In our catalina.out there
> are multiple error messages which all tell the same:
>
> 2968239 [http-80-Processor4] INFO - Finished pdf extraction with
> PDFBox in 474ms.
> 2968239 [http-80-Processor4] INFO - Finished reading pdf Reader to
> String in 0ms.
> 2969435 [http-80-Processor4] INFO - Finished pdf extraction with
> PDFBox in 158ms.
> 2969435 [http-80-Processor4] INFO - Finished reading pdf Reader to
> String in 0ms.
> java.lang.Throwable: Warning: You did not close the PDF Document
> at org.pdfbox.cos.COSDocument.finalize()V(COSDocument.java:386)
> at java.lang.Object.runFinalizer()V(Unknown Source)
> at
> java.lang.LangAccessImpl.objectFinalize(Ljava.lang.Object;)V(Unknown
> Source)
> at java.lang.ref.Finalizer.runFinalizer()V(Unknown Source)
> at
> java.lang.ref.Finalizer.access$100(Ljava.lang.ref.Finalizer;)V(Unknown
> Source)
> at java.lang.ref.Finalizer$FinalizerThread.run()V(Unknown > Source)
>
> I'm pretty sure there was a issue on this before, but i can't manage
> to find the mailing list entry again.
>
> So far we only migrated the DB content (mysql) and the filesystem
> content (/WEB-INF/var/content/*) only. So there shouldn't be any side
> effects from template customizing etc.
>
> Best regards
> Daniel Zimmermann
