[ after our latest SMTP exchange, notice what hhotmail does
with just splain text LOL... I'm not sure anyone even tests this
stuff ]





________________________________
> Date: Fri, 28 May 2010 09:03:04 -0700
> From: [email protected]
> To: [email protected]
> Subject: Re: [iText-questions] Spam: Unit testing flattened PDFs
>
>
>
>
>
>
> Unit testing PDF is Notoriously Difficult.
>
For just plain pixel compares, 
I've suggested this before but if you are really stuck and have 
resources, consider something like instrumented video compression libraries.
That is, the compression relies on isolating things of perceptual 
interest, like motion vectors for example. Now, ideally if you could
get a result that says "this block is moved over between the two frames"
that might be the metric you want. 

>
> Ideally, you’d save the coordinates
> of your various fields and run OCR on your resulting flattened PDF, looking 
> for
> the correct text in the correct place.

Well, presumably you have the fonts that you could render ( ex ligatures etc) 
and you could just look for pixel blocks that match, this is a lot easier than
general OCR with unknown fonts or sizes ( if you can't estimate these a priori
you are stuck LOL). 


Most people who make stuff up have a model described SOMEWHERE even if
they have to absolutely positively remove every trace of it before publishing
their standard professional result. It isn't entirely cheating to use this
for testing but you can appreciate how useful it is to those of us who
get stuck using your pixel creation too. 

>
>
>
>
>
>
>
> Realistically? Umm… ouch.
> Actually, the pdf.parser.PdfTextExtractor could be Quite Helpful. Yeah…
> ! Check out SimpleTextExtractingPdfContentStreamProcessor. With a name like
> that, it must be easy, right?
>
>
>
>
                                          
_________________________________________________________________
Hotmail is redefining busy with tools for the New Busy. Get more from your 
inbox.
http://www.windowslive.com/campaign/thenewbusy?ocid=PID28326::T:WLMTAGL:ON:WL:en-US:WM_HMP:042010_2
------------------------------------------------------------------------------

_______________________________________________
iText-questions mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/itext-questions

Buy the iText book: http://www.itextpdf.com/book/
Check the site with examples before you ask questions: 
http://www.1t3xt.info/examples/
You can also search the keywords list: http://1t3xt.info/tutorials/keywords/

Reply via email to