#2 is the main use case - which is why all existing OCR implementations do what do you currently and match the size...
Leonard -----Original Message----- From: Benjamin Podszun [mailto:[email protected]] Sent: Tuesday, January 13, 2009 3:26 AM To: Post all your questions about iText here Subject: Re: [iText-questions] Searchable PDF, cosmetic issue: Copying text with format results in "wrong" sizes Thank you. Actually I haven't settled on whether I prefer the right size in the copied content or the right selection rectangles. I have to check what usecase is dominant: - Copying OCR content from the PDF - Searching/indexing/reading the PDF For the former I'd need to give better/correct sizes. For the latter it feels much more natural to have "correct" selection rectangles while searching/navigating the document. Anyway, thanks for the advice. Ben On Mon, 12 Jan 2009 09:36:38 -0800, Leonard Rosenthol <[email protected]> wrote: > No, there is no way to solve both. > > I would think that you would WANT the text to come out with the correct > sizes when copied/pasted... > > Leonard > > -----Original Message----- > From: Benjamin Podszun [mailto:[email protected]] > Sent: Monday, January 12, 2009 12:06 PM > To: [email protected] > Subject: [iText-questions] Searchable PDF, cosmetic issue: Copying text > with format results in "wrong" sizes > > Hi there. > > Trying to polish my tif to searchable pdf(optional pdf/a) code I'm > currently faced with a rather cosmetic issue, but one co-worker already > found it and complained (albeit grinning..) about a minor thing.. > > When I created this code I tried (successfully) to add the text to the > document in the "right" size so that the selection covers the right area. I > wanted a search for "foo" highlighting the complete occurence of foo. If > you highlight anything in the sample file it seems like you're highlighting > the printed text (although that's just an image, the real text is invisible > and OCR generated). > > My code does it in a very stupid and naive way like this (ignore the mixed > casing, I'm using the java itext library in C# using ikvm to support pdf/a > properly): > > private static float CalculateFontSize(BaseFont font, double maximumWidth, > string contents) { > // Calculate the maximum font size for a given string and bound > int intSize = 3; > > if (font.getWidthPoint(contents, intSize) < 0.1f) return 0f; > > while (font.getWidthPoint(contents, intSize) < maximumWidth) > intSize++; > > double fontSize = intSize; > while (font.getWidthPoint(contents, (float)fontSize) > maximumWidth) > fontSize -= 0.1; > > return (float)fontSize; > } > > This computes the font size that would be needed to stay just within a > given width and works fine so far. But it results in rather large numbers > (the sample PDF's headline results roughly in a font size of 80 for > example). > > The mentioned co-worker now tried ctrl + a, open word(pad), ctrl + c -> > The text is copied including the formatting and the resulting document > contains all of a sudden HUGE words/lines. I wonder if it's possible to > solve both issues (nice selection, sensible result)? Can tags help here? > Can I maybe solve the selection rectangle problem in another way, sticking > to the correct font sizes for the text? > > I admit that the issue is very small, but if anyone has an idea what I > could improve here I'd be glad to polish this thing some more. > I attached a sample PDF to demonstrate the issue better than my rather > limited english can. > > Regards & thanks in advance, > Ben > ------------------------------------------------------------------------------ > This SF.net email is sponsored by: > SourcForge Community > SourceForge wants to tell your story. > http://p.sf.net/sfu/sf-spreadtheword > _______________________________________________ > iText-questions mailing list > [email protected] > https://lists.sourceforge.net/lists/listinfo/itext-questions > > Buy the iText book: http://www.1t3xt.com/docs/book.php > > > !DSPAM:496b8110179811804284693! ------------------------------------------------------------------------------ This SF.net email is sponsored by: SourcForge Community SourceForge wants to tell your story. http://p.sf.net/sfu/sf-spreadtheword _______________________________________________ iText-questions mailing list [email protected] https://lists.sourceforge.net/lists/listinfo/itext-questions Buy the iText book: http://www.1t3xt.com/docs/book.php ------------------------------------------------------------------------------ This SF.net email is sponsored by: SourcForge Community SourceForge wants to tell your story. http://p.sf.net/sfu/sf-spreadtheword _______________________________________________ iText-questions mailing list [email protected] https://lists.sourceforge.net/lists/listinfo/itext-questions Buy the iText book: http://www.1t3xt.com/docs/book.php
