I get nervous when I hear calls for re-throwing an exception. In my opinion, an exception should be re-thrown if either: 1. The exception is due to something so drastic that the entire operation needs to be brought to an end. or 2. The exception indicates a condition for which the method experiencing the exception cannot adjust, but the calling method *can*.
When I started using PDFBox, exceptions usually bubbled all the way to the top ... so a very small problem handling a PDF would mean we could do nothing with it. On Sat, Jan 28, 2012 at 12:11 PM, Mahesh Yadav < [email protected]> wrote: > Thanks Timo, > > I will give a try of your fix and let you know. > > Currently I was working on solution which will ignore text extraction on a > page containing images (scanned page), I am done with my changes but still > need to validate it by some performance tests. This at least will not crash > my application if someone uploads scanned pdf on loaded system. > > I was wondering if we have some configuration by which we can ignore > rendering (text extraction) of images in pdf, in my case this would be > scanned pages?. > > Thanks > Mahesh > > > On Fri, Jan 27, 2012 at 3:30 PM, Timo Boehme <[email protected] > >wrote: > > > I continue this thread on dev list in order to not clutter JIRA issue > > PDFBOX-847. > > > > Mahesh Yadav commented on PDFBOX-847: > >> ------------------------------**------- > >> ... > >> We use jackrabbit and only difference that we have is we have our own > >> custom parser (not provided by jackrabbit) for parsing pdf and we > interact > >> with pdfbox as shown below. > >> > >> PDFParser parser = new PDFParser(new BufferedInputStream(stream)); > >> PDDocument document = parser.getPDDocument(); > >> parser.parse(); > >> PDFTextStripper stripper = new PDFTextStripper(); > >> stripper.setLineSeparator("\n"**); > >> stripper.writeText(document, writer) > >> > >> I think we need to change above approach and use " PDDocument.load" with > >> RandomAccessFile > >> > > > > if you set a temporary directory before parse() with > > parser.setTempDirectory > > it will automatically use temporary file instead of memory buffer. > > > > > > Timo > > > > -- > > > > Timo Boehme > > OntoChem GmbH > > H.-Damerow-Str. 4 > > 06120 Halle/Saale > > T: +49 345 4780474 > > F: +49 345 4780471 > > [email protected] > > > > ______________________________**______________________________**_________ > > > > OntoChem GmbH > > Geschäftsführer: Dr. Lutz Weber > > Sitz: Halle / Saale > > Registergericht: Stendal > > Registernummer: HRB 215461 > > ______________________________**______________________________**_________ > > > > >
