Thank you.

Actually I haven't settled on whether I prefer the right size in the copied
content or the right selection rectangles. I have to check what usecase is
dominant:

- Copying OCR content from the PDF
- Searching/indexing/reading the PDF

For the former I'd need to give better/correct sizes. For the latter it
feels much more natural to have "correct" selection rectangles while
searching/navigating the document.

Anyway, thanks for the advice.
Ben

On Mon, 12 Jan 2009 09:36:38 -0800, Leonard Rosenthol <[email protected]>
wrote:
> No, there is no way to solve both.
> 
> I would think that you would WANT the text to come out with the correct
> sizes when copied/pasted...
> 
> Leonard
> 
> -----Original Message-----
> From: Benjamin Podszun [mailto:[email protected]]
> Sent: Monday, January 12, 2009 12:06 PM
> To: [email protected]
> Subject: [iText-questions] Searchable PDF, cosmetic issue: Copying text
> with format results in "wrong" sizes
> 
> Hi there.
> 
> Trying to polish my tif to searchable pdf(optional pdf/a) code I'm
> currently faced with a rather cosmetic issue, but one co-worker already
> found it and complained (albeit grinning..) about a minor thing..
> 
> When I created this code I tried (successfully) to add the text to the
> document in the "right" size so that the selection covers the right area.
I
> wanted a search for "foo" highlighting the complete occurence of foo. If
> you highlight anything in the sample file it seems like you're
highlighting
> the printed text (although that's just an image, the real text is
invisible
> and OCR generated).
> 
> My code does it in a very stupid and naive way like this (ignore the
mixed
> casing, I'm using the java itext library in C# using ikvm to support
pdf/a
> properly):
> 
> private static float CalculateFontSize(BaseFont font, double
maximumWidth,
> string contents) {
>   // Calculate the maximum font size for a given string and bound
>   int intSize = 3;
> 
>   if (font.getWidthPoint(contents, intSize) < 0.1f) return 0f;
> 
>   while (font.getWidthPoint(contents, intSize) < maximumWidth)
>     intSize++;
> 
>   double fontSize = intSize;
>   while (font.getWidthPoint(contents, (float)fontSize) > maximumWidth)
>     fontSize -= 0.1;
> 
>   return (float)fontSize;
> }
> 
> This computes the font size that would be needed to stay just within a
> given width and works fine so far. But it results in rather large numbers
> (the sample PDF's headline results roughly in a font size of 80 for
> example).
> 
> The mentioned co-worker now tried ctrl + a, open word(pad), ctrl + c ->
> The text is copied including the formatting and the resulting document
> contains all of a sudden HUGE words/lines. I wonder if it's possible to
> solve both issues (nice selection, sensible result)? Can tags help here?
> Can I maybe solve the selection rectangle problem in another way,
sticking
> to the correct font sizes for the text?
> 
> I admit that the issue is very small, but if anyone has an idea what I
> could improve here I'd be glad to polish this thing some more.
> I attached a sample PDF to demonstrate the issue better than my rather
> limited english can.
> 
> Regards & thanks in advance,
> Ben
>
------------------------------------------------------------------------------
> This SF.net email is sponsored by:
> SourcForge Community
> SourceForge wants to tell your story.
> http://p.sf.net/sfu/sf-spreadtheword
> _______________________________________________
> iText-questions mailing list
> [email protected]
> https://lists.sourceforge.net/lists/listinfo/itext-questions
> 
> Buy the iText book: http://www.1t3xt.com/docs/book.php
> 
> 
> !DSPAM:496b8110179811804284693!


------------------------------------------------------------------------------
This SF.net email is sponsored by:
SourcForge Community
SourceForge wants to tell your story.
http://p.sf.net/sfu/sf-spreadtheword
_______________________________________________
iText-questions mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/itext-questions

Buy the iText book: http://www.1t3xt.com/docs/book.php

Reply via email to