Hi Tim,

I've created https://issues.apache.org/jira/browse/PDFBOX-3058 
<https://issues.apache.org/jira/browse/PDFBOX-3058> to track our part of fixing 
issues as part of the test (and later onset come) and added you and Tilman as a 
watcher.

BR
Maruan


> Am 23.10.2015 um 21:36 schrieb Allison, Timothy B. <[email protected]>:
> 
> All,
> 
>  Apologies for the delay.  I finally finished the comparison of text 
> extracted from 100k pdfs with 1.8.10 and 2.0 trunk 
> (pdfbox-2.0.0-20151022.051152-1783).
> The reports are available here [0].  I botched the commit message...
> 
>  I haven't had a chance to review the results.  The eval code is still in 
> development and there might be bugs! To view the docs, prepend: h t t p : 
> slash slash one six two . two four two . two two eight . one seven four/docs/ 
>  ... just don't let any of the scrapers read that. ;)  The docs include all 
> those within our corpus that had a rtl word (when extracted with 1.8.10 :)) 
> and then I took a random selection to fill out ~100k pdfs from common crawl 
> and govdocs1.
> 
>  Let me know if you have any questions.
> 
>          Cheers,
> 
>                     Tim
> 
> 
> [0] 
> https://github.com/tballison/share/blob/master/pdfbox_comparisons/pdfbox_1_8_10V2_0_20151023.zip
> 

Reply via email to