[iText-questions] [SPAM] Re: Searchable Pdf using Itext

Hans Petrich Tue, 16 Feb 2010 07:53:17 -0800

I struggled with this issue for a while and ended up putting invisible text
behind the images that were searchable.  If you modify the text to fit the
dimensions of the image, it works perfectly.  Use the PdfContentByte and set
its contextRendering to invisible.


Hope it helps.

On Tue, Feb 16, 2010 at 8:18 AM, <
[email protected]> wrote:

> Send iText-questions mailing list submissions to
>        [email protected]
>
> To subscribe or unsubscribe via the World Wide Web, visit
>        https://lists.sourceforge.net/lists/listinfo/itext-questions
> or, via email, send a message with subject or body 'help' to
>        [email protected]
>
> You can reach the person managing the list at
>        [email protected]
>
> When replying, please edit your Subject line so it is more specific
> than "Re: Contents of iText-questions digest..."
>
>
> Today's Topics:
>
>   1. Re: Searchable Pdf using Itext (1T3XT info)
>   2. Re: No fields returned by AcroFields.Count (1T3XT info)
>   3. Re: No fields returned by AcroFields.Count (1T3XT info)
>   4. Re: Umlaut        problem with SnowLeopard font (Paulo Soares)
>   5. Re: RandomAccessFileOrArray file load in memory (Paulo Soares)
>   6. Re: identify page content which is not marked and missing
>      marked page content in structure tree (Leonard Rosenthol)
>   7. Re: RandomAccessFileOrArray file load in memory (Mike Marchywka)
>   8. [SPAM] naked children, children naked,    naked children having
>      sex (Etta Shingler)
>
>
> ----------------------------------------------------------------------
>
> Message: 1
> Date: Tue, 16 Feb 2010 11:26:50 +0100
> From: 1T3XT info <[email protected]>
> Subject: Re: [iText-questions] Searchable Pdf using Itext
> To: [email protected], Post all your questions about iText here
>        <[email protected]>
> Message-ID: <[email protected]>
> Content-Type: text/plain; charset=ISO-8859-1; format=flowed
>
> [email protected] wrote:
> > OCR only converts image to text but can it convert image type to
> searchable pdfs?
> > 1T3XT info wrote:
> >> hemabalan wrote:
> >>> Is it possible to convert an image type pdf to searchable pdf using
> itext
> >> No, you'd need an OCR tool to do that.
>
> Please don't answer to private e-mails when you get an answer
> from the mailing-list.
>
> Please don't send further mails to the mailing-list if your
> question is marked 'off-topic'.
>
> Text in images isn't searchable.
> To make them searchable, you need text.
> To get text from an image, you need OCR.
>
> iText doesn't do OCR, so your question is off-topic on the list.
> --
> This answer is provided by 1T3XT BVBA
> http://www.1t3xt.com/ - http://www.1t3xt.info
>
>
>
> ------------------------------
>
> Message: 2
> Date: Tue, 16 Feb 2010 11:27:52 +0100
> From: 1T3XT info <[email protected]>
> Subject: Re: [iText-questions] No fields returned by AcroFields.Count
> To: Post all your questions about iText here
>        <[email protected]>
> Message-ID: <[email protected]>
> Content-Type: text/plain; charset=ISO-8859-1; format=flowed
>
> Lucian Hancu wrote:
> >
> > Dear all,
> >
> > I have a major problem with itext, which has not been solved by version
> 5.0
> >
> > My pdf has several fields, I can see them correctly in Acrobat 8.1.2 and
> > above. Itext instead returns 0 (zero) fields.
> >
> > Please give me some hints in how to solve it
>
> That's explained in chapter 8 of the new book about iText.
> If iText doesn't see any fields, the form is not an AcroForm.
> It's probably a dynamic form.
> --
> This answer is provided by 1T3XT BVBA
> http://www.1t3xt.com/ - http://www.1t3xt.info
>
>
>
> ------------------------------
>
> Message: 3
> Date: Tue, 16 Feb 2010 11:35:07 +0100
> From: 1T3XT info <[email protected]>
> Subject: Re: [iText-questions] No fields returned by AcroFields.Count
> To: Post all your questions about iText here
>        <[email protected]>
> Message-ID: <[email protected]>
> Content-Type: text/plain; charset=ISO-8859-1; format=flowed
>
> Sinha, Abhishek wrote:
> > Hi,
> >
> >    I do not wish to receive these emails anymore. Please do the needful.
>
> You can easily do this yourself:
> https://lists.sourceforge.net/lists/listinfo/itext-questions
> --
> This answer is provided by 1T3XT BVBA
> http://www.1t3xt.com/ - http://www.1t3xt.info
>
>
>
> ------------------------------
>
> Message: 4
> Date: Tue, 16 Feb 2010 11:05:55 -0000
> From: "Paulo Soares" <[email protected]>
> Subject: Re: [iText-questions] Umlaut   problem with SnowLeopard font
> To: "Post all your questions about iText here"
>        <[email protected]>
> Message-ID: <004801caaef8$caa7afb0$aa302...@psoaresw>
> Content-Type: text/plain; format=flowed; charset="iso-8859-1";
>        reply-type=original
>
> What exactly is your fix?
>
> Paulo
>
> ----- Original Message -----
> From: "Christoph Wagner" <[email protected]>
> To: "'Post all your questions about iText here'"
> <[email protected]>
> Sent: Monday, February 15, 2010 11:08 PM
> Subject: Re: [iText-questions] Umlaut problem with SnowLeopard font
>
>
> Hi everyone,
>
> I debugged the problem and I was able to locate the bug in iText. The cmap
> parsing code in TrueTypeFont.readCMaps() does not handle Mac Unicode ttf
> correctly. Adding that fixes the problem for me.
>
> Thanks for your help,
> Christoph Wagner
>
>
>
>
>
> ------------------------------
>
> Message: 5
> Date: Tue, 16 Feb 2010 11:11:05 -0000
> From: "Paulo Soares" <[email protected]>
> Subject: Re: [iText-questions] RandomAccessFileOrArray file load in
>        memory
> To: "Post all your questions about iText here"
>        <[email protected]>
> Message-ID: <004901caaef8$ce5a3060$aa302...@psoaresw>
> Content-Type: text/plain; format=flowed; charset="iso-8859-1";
>        reply-type=original
>
> For lots of users you'll need lots of memory and if you're processing big
> images you'll need to partially do it in memory as iText does. No miracles
> here.
>
> Paulo
>
> ----- Original Message -----
> From: "Christophe from paris" <[email protected]>
> To: <[email protected]>
> Sent: Monday, February 15, 2010 3:07 PM
> Subject: [iText-questions] RandomAccessFileOrArray file load in memory
>
>
>
> Hello,
>
> i'm confront to big problem of out.memory error on server when i have a lot
> of user who want get a pdf file from tiff file.
>
> I use Itext 1.2.7
>
> Is it possible to Override the RandomAccessFileOrArray for replace the byte
> arrayIn[] by a temporary file ?
>
> or use temporary file when the file is more than 5 ko for example.
>
> Thank you,
>
>
>
>
> ------------------------------
>
> Message: 6
> Date: Tue, 16 Feb 2010 03:35:23 -0800
> From: Leonard Rosenthol <[email protected]>
> Subject: Re: [iText-questions] identify page content which is not
>        marked and missing marked page content in structure tree
> To: "[email protected]"
>        <[email protected]>
> Message-ID:
>        <d23d6b9e57d654429a9ab6918caceaa97ca60ba...@nambx02.corp.adobe.com>
> Content-Type: text/plain; charset="us-ascii"
>
> If you are scanning the content stream, then you'll easily find content
> that isn't marked.
>
> I would, however, point you to the current drafts of PDF/UA (ISO 14289)
> which speak a LOT about this type of work...
>
> -----Original Message-----
> From: newoutlook [mailto:[email protected]]
> Sent: Monday, February 15, 2010 3:22 PM
> To: [email protected]
> Subject: [iText-questions] identify page content which is not marked and
> missing marked page content in structure tree
>
>
> I want to identify page content which is not marked and missing marked page
> content in structure tree. Currently, I am getting MCIDs(Marked content
> identifier) from page contentStream and check the marked conent is in
> structuretree.   This works ok. How do I identify page content which is not
> makred ?  any help please?
>
> Sal Salaimani
>
>
>
> --
> View this message in context:
> http://old.nabble.com/identify-page-content-which-is-not-marked-and-missing-marked-page-content-in-structure-tree-tp27599425p27599425.html
> Sent from the iText - General mailing list archive at Nabble.com.
>
>
>
> ------------------------------------------------------------------------------
> SOLARIS 10 is the OS for Data Centers - provides features such as DTrace,
> Predictive Self Healing and Award Winning ZFS. Get Solaris 10 NOW
> http://p.sf.net/sfu/solaris-dev2dev
> _______________________________________________
> iText-questions mailing list
> [email protected]
> https://lists.sourceforge.net/lists/listinfo/itext-questions
>
> Buy the iText book: http://www.1t3xt.com/docs/book.php
> Check the site with examples before you ask questions:
> http://www.1t3xt.info/examples/
> You can also search the keywords list:
> http://1t3xt.info/tutorials/keywords/
>
>
>
> ------------------------------
>
> Message: 7
> Date: Tue, 16 Feb 2010 07:47:28 -0500
> From: Mike Marchywka <[email protected]>
> Subject: Re: [iText-questions] RandomAccessFileOrArray file load in
>        memory
> To: <[email protected]>
> Message-ID: <[email protected]>
> Content-Type: text/plain; charset="iso-8859-1"
>
>
> ----------------------------------------
> > Date: Mon, 15 Feb 2010 07:07:50 -0800
> > From:
> > To: [email protected]
> > Subject: [iText-questions] RandomAccessFileOrArray file load in memory
> >
> >
> > Hello,
> >
> > i'm confront to big problem of out.memory error on server when i have a
> lot
> > of user who want get a pdf file from tiff file.
>
> This may be considered OT for itext, and maybe someone has a better answer,
> ?but after all my comments about resource
> usage re pdf machinations- short answer see if anyone at sun.com has
> similar issues and solutions for other server things and try subbing into
> itext for your own needs,
> unless? Bruno has canned implementation alternatives. For requests that
> just
> take a long time, you may have to change your paradigm and notify user
> later via
> email or something when result is done. These are not specific to itext or
> pdf.
>
> >
> > I use Itext 1.2.7
> >
> > Is it possible to Override the RandomAccessFileOrArray for replace the
> byte
> > arrayIn[] by a temporary file ?
>
> Generally memory management in java is quite limited and long before you
> run
> out of memory you would want to do things like maximize low level cache
> hits etc.
> However, there may be something on sun.com as this is likely to
> be a common issue when you scale java apps ( I've never bothered to look
> mysef but it woldn't just be about itext) and "you have the source code"
> so you can take alt approaches. Code is never really platform independent
> and implementation details make of break real-world utility ( hence
> issues with pdf resource needs and benefits).
>
> Also note if all the
> users are translating the same image, in-memory caching of single objects
> not duplicated hundreds of times, can be a big savings. You need a sharing
> mechanism in this case. A "scalable itext" or something like that would
> probably
> be a commercial product :)
>
>
> Assuming you have zero virtual memory right now , this is just going to
> slow things down even
> more ( preusmably your current "out of memory" condiution has alrady been
> addressed with
> increased heap size to the point of doing a lot of VM thrashing )
> and it could get to the point where each requests takes forever
> as the whole system thrashes between requests ( you can probably write a
> simple equation
> to determine the number of executing requests given the arrival rate and
> processing time with proc time
> increasing with number of active requests). You might just be better
> off limiting the number of active requests and queing the rest and notify
> user when done if currently you are trying to return a complete pdf to user
> via the requesting
> http connection.
>
>
> >
> > or use temporary file when the file is more than 5 ko for example.
> >
> > Thank you,
> > --
> > View this message in context:
> http://old.nabble.com/RandomAccessFileOrArray-file-load-in-memory-tp27595180p27595180.html
> > Sent from the iText - General mailing list archive at Nabble.com.
> >
> >
> >
> ------------------------------------------------------------------------------
> > SOLARIS 10 is the OS for Data Centers - provides features such as DTrace,
> > Predictive Self Healing and Award Winning ZFS. Get Solaris 10 NOW
> > http://p.sf.net/sfu/solaris-dev2dev
> > _______________________________________________
> > iText-questions mailing list
> > [email protected]
> > https://lists.sourceforge.net/lists/listinfo/itext-questions
> >
> > Buy the iText book: http://www.1t3xt.com/docs/book.php
> > Check the site with examples before you ask questions:
> http://www.1t3xt.info/examples/
> > You can also search the keywords list:
> http://1t3xt.info/tutorials/keywords/
>
> _________________________________________________________________
> Hotmail: Powerful Free email with security by Microsoft.
> http://clk.atdmt.com/GBL/go/201469230/direct/01/
>
>
> ------------------------------
>
> Message: 8
> Date: Tue, 16 Feb 2010 18:17:52 +0300
> From: Etta Shingler <[email protected]>
> Subject: [iText-questions] [SPAM] naked children, children naked,
>        naked children having sex
> To: [email protected]
> Message-ID:
>        <[email protected]>
> Content-Type: text/plain; charset=ISO-8859-1
>
> Bill, preteens here http://doiop.com/8mb1jn
>
> naked children
> children naked
> naked children having sex
> naked children pictures
> naked little children
> young children naked
> naked young children
> naked children photos
> naked children sex
> young naked children
>
>
>
> ------------------------------
>
>
> ------------------------------------------------------------------------------
> SOLARIS 10 is the OS for Data Centers - provides features such as DTrace,
> Predictive Self Healing and Award Winning ZFS. Get Solaris 10 NOW
> http://p.sf.net/sfu/solaris-dev2dev
>
> ------------------------------
>
> _______________________________________________
> iText-questions mailing list
> [email protected]
> https://lists.sourceforge.net/lists/listinfo/itext-questions
>
>
> End of iText-questions Digest, Vol 45, Issue 58
> ***********************************************
>

------------------------------------------------------------------------------
SOLARIS 10 is the OS for Data Centers - provides features such as DTrace,
Predictive Self Healing and Award Winning ZFS. Get Solaris 10 NOW
http://p.sf.net/sfu/solaris-dev2dev

_______________________________________________
iText-questions mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/itext-questions

Buy the iText book: http://www.1t3xt.com/docs/book.php
Check the site with examples before you ask questions: 
http://www.1t3xt.info/examples/
You can also search the keywords list: http://1t3xt.info/tutorials/keywords/

[iText-questions] [SPAM] Re: Searchable Pdf using Itext

Reply via email to