Hi,

maybe wie can join forces here as I'm currently working on an Xref class which 
parses xref tables and xref streams. One method should also do the mentioned 
scanning.

Kind regards

Maruan Sahyoun

Am 19.07.2012 um 09:42 schrieb "Andreas Lehmkühler" <andr...@lehmi.de>:

> 
> Timo Boehme <timo.boe...@ontochem.com> hat am 16. Juli 2012 um 18:02
> geschrieben:
> 
>> Hi,
>> 
>> Am 16.07.2012 17:48, schrieb Andreas Lehmkuehler:
>>> Am 10.07.2012 09:16, schrieb Timo Boehme:
>>>> ...
>>>> looks good to me. Some mention about the preflight module which will be
>>>> integrated in the next major release?
>>> Thanks for your comment. I added some information about preflight/xmpbox
>>> as you maybe already have seen.
>> 
>> Yes, thank you very much for all the time spending on administrative
>> tasks/improvements on PDFBOX.
>> 
>> For the next time I plan to improve on the broken document robustness of
>> the parser by doing a first scan over the document (in case of parsing
>> failure), collecting object start/end points and using them to repair
>> xref table.
> 
> 
> Seems to be necessary, at least for some PDFs. :-(
> 
> 
>> Another task I would like to do is reducing the amount of memory needed
>> by using the existing file as input stream resource instead of copying
>> an object stream first to a temporary buffer (in cases where an input
>> file exists).
>> Maybe for this we should change from assuming to have an input stream to
>> assuming we have an input file and if we have an input stream a
>> temporary file is created on the fly - WDYT?
> 
> 
> I guess internally we have to use something abstract and as everything is a
> stream
> the might be a good choice. AFAIU the current implementation, one reason for 
> the
> usage of a temporary buffer is the fact that the data is modified
> (decompressing,
> decrypting) and we must not alter the input data. It is perhaps a better idea 
> to
> somehow split the inputstream and the unfilteredinputstream, e.g. read from 
> the
> inputstream every time an object is dereferenced and store the (decompressed)
> data in the corresponding object.
> 
>> 
>> 
>> Kind regards,
>> Timo
> 
> 
> BR
> Andreas Lehmkühler

Reply via email to