i not want to transforme my pages in image.. its so.. i will extract de text of my pdf.. its be easy..
and in same action.. i will extract all images from this pdf.. apply OCR in images.. for extract text of each image. so ... i need so much to get all images into the pdf.. i won to take the byte raw of image.. now need transform that in a valid JAVA.AWT.IMAGE OR BUFFEREDIMAGE Mike Marchywka-2 wrote: > > > > > > You can always use the command line tool in pdf toolkit or xpf, > I can't remember which but there is something like > pdf2image similar to pdf2text to extract text. > > > > > > > > ---------------------------------------- >> Date: Tue, 23 Feb 2010 12:43:28 -0800 >> From: fernandogomes...@hotmail.com >> To: itext-questions@lists.sourceforge.net >> Subject: Re: [iText-questions] Using Images extracted from a pdf >> >> >> I'm going crazy with it. as you can see, I never manipulated images as >> low >> level. and do not have much sense of how things work. I am searching for >> a >> days for end my solution. and I'm already getting stressed. >> i going on test methods .. i try to do.. and before try by another >> choice.. >> -.- >> >> can you give me some more assistance on how I can turn this array of >> bytes >> back into an image? >> >> could have just one class of api that made it not? : P >> >> Pdfimages buf = new pdfimages (myRawImageByteArray); >> buf.getAsBufferedImage (); >> >> : P >> >> if you say you can not help me all right, but I can indicate a content in >> which I can rely on to get this done? >> >> thanks. >> >> >> Leonard Rosenthol-3 wrote: >>> >>> The image is decompressed and then "injected" into the PDF. Same with >>> EVERY TYPE of image EXCEPT JPEG. >>> >>> -----Original Message----- >>> From: Fernando Gomes [mailto:fernandogomes...@hotmail.com] >>> Sent: Tuesday, February 23, 2010 3:21 PM >>> To: itext-questions@lists.sourceforge.net >>> Subject: Re: [iText-questions] Using Images extracted from a pdf >>> >>> >>> ty .. >>> >>> I have a question. >>> when I insert an image that is not jpeg >>> what exactly happens with this? >>> >>> say that it is in PNG it is decompressed to be "injected" into PDF? >>> >>> or she keeps your PNG format, but the bytes are encoded with the >>> FlateEncode >>> .. >>> >>> a matter of finding the filter and decode do I get it. >>> >>> and if the image is uncompressed before being inserted to PDF, how do I >>> know >>> which type of encode the image? >>> >>> >>> Leonard Rosenthol-3 wrote: >>>> >>>> Bits per pixel is the BitsPerComponent value in the image object >>>> >>>> Pixels per line (POR LINHA) is _NOT_ Width * bits. It's Width * >>>> NumComponents, where NumComponents is based on the colorspace in >>>> question >>>> (eg. RGB == 3, CMYK == 4). >>>> >>>> -----Original Message----- >>>> From: Fernando Gomes [mailto:fernandogomes...@hotmail.com] >>>> Sent: Tuesday, February 23, 2010 2:00 PM >>>> To: itext-questions@lists.sourceforge.net >>>> Subject: Re: [iText-questions] Using Images extracted from a pdf >>>> >>>> >>>> >>>> >>>>> public static BufferedImage createBufferedImageFromRawBytes(byte[] >>>>> bytes,int width, int height, int bits) throws BadElementException, >>>>> MalformedURLException, IOException { >>>>> com.lowagie.text.Image img = >>>>> com.lowagie.text.Image.getInstance(bytes); >>>>> >>>>> DataBuffer db = new DataBufferByte (img.getRawData(), >>>>> img.getRawData().length); >>>>> >>>>> WritableRaster raster = Raster.createPackedRaster(db, //DATA BUFFER >>>>> width, //LARGURA >>>>> height, //ALTURA >>>>> width*bits, //LARGURA * BITS POR PIXEL = PIXEL POR >>>>> LINHA >>>>> ->scanlineStride >>>>> // bits, //BITS POR PIXEL ->pixelStride >>>>> new int [] {bits}, >>>>> >>>>> null); >>>>> >>>>> ColorSpace cs = ColorSpace.getInstance (img.getColorspace()); >>>>> ColorModel cm = new ComponentColorModel(cs, false, false, >>>>> Transparency.OPAQUE, db.getDataType()); >>>>> BufferedImage bi = new BufferedImage (cm, raster, false, null); >>>>> return null; >>>>> } >>>>> >>>>> >>>> >>>> this code is up to where I could get, but there are variables that I >>>> know >>>> of >>>> to generate bufferedImage, please someone help me see if I'm on track. >>>> If I write something wrong. >>>> >>>> >>>> >>>> Fernando Gomes wrote: >>>>> >>>>> can anyone help-me one more time.. >>>>> i dont know what i do .. >>>>> >>>>> I need to get the image bytes, now decoded... >>>>> >>>>> String colorSpace = pdfStrem.get(PdfName.COLORSPACE).toString(); >>>>>> String filter = pdfStrem.get(PdfName.FILTER).toString(); >>>>>> int bits = >>>>>> Integer.valueOf(pdfStrem.get(PdfName.BITSPERCOMPONENT).toString()); >>>>>> int width = >>>>>> Integer.valueOf(pdfStrem.get(PdfName.WIDTH).toString()); >>>>>> int height = >>>>>> Integer.valueOf(pdfStrem.get(PdfName.HEIGHT).toString()); >>>>>> PdfDictionary param = >>>>>> (PdfDictionary)pdfStrem.get(PdfName.DECODEPARMS); >>>>>> int colors = >>>>>> Integer.valueOf(param.get(PdfName.COLORS).toString()); >>>>>> int predictor = >>>>>> Integer.valueOf(param.get(PdfName.PREDICTOR).toString()); >>>>>> int colums = >>>>>> Integer.valueOf(param.get(PdfName.COLUMNS).toString()); >>>>>> if(filter.equals("/FlateDecode")) >>>>>> { >>>>>> byte[] bytesDecod = PdfReader.FlateDecode(bytes); >>>>> >>>>> these are all the information that I can withdraw PDF >>>>> >>>>> I have to do to create my image in general .. >>>>> I'm trying to do, or learn, but this hard, all my attempts have >>>>> failed. >>>>> ty >>>>> >>>>> >>>>> Fernando Gomes wrote: >>>>>> >>>>>> Sirs, really sorry for duplicating, can delete other topics ? >>>>>> so sorry ..:blush: >>>>>> >>>>>> very thkx for help.. >>>>>> and so good fast help .. >>>>>> i will estudy more .. >>>>>> >>>>>> >>>>>> Leonard Rosenthol-3 wrote: >>>>>>> >>>>>>> You are assuming that PDF maintains the PNG nature of the image - >>>>>>> that >>>>>>> is NOT the case. PDF only supports two kinds of images JPEG (which >>>>>>> is >>>>>>> why this works) and "raw bitmaps" (aka an array of bits). So in your >>>>>>> case, with the PNG, it is transcoded into the latter case and so if >>>>>>> you >>>>>>> want it back you will need to reverse the process on your end. >>>>>>> >>>>>> >>>>>> >>>>>> for this response in other same email :blush: >>>>>> quote of "1T3XT info" below .. >>>>>> >>>>>> really thanks. I must have seen the realance the chapter that you >>>>>> mentioned, I will read again and very carefully. My English is very >>>>>> weak, >>>>>> and it is very difficult to read. >>>>>> >>>>>> you are very funny, I laughed a lot. I know I deserved the scolding. >>>>>> Really thanks for your help. I will test and then come back to post >>>>>> the >>>>>> result. >>>>>> Thank you! >>>>>> >>>>>> >>>>>> 1T3XT info wrote: >>>>>>> >>>>>>> Fernando Henrique Gomes wrote: >>>>>>>> the problem is when I insert an image in PNG format and then try to >>>>>>>> get >>>>>>>> the same... >>>>>>> >>>>>>> OK, we're talking about a PNG. >>>>>>> If you've read chapter 10 of the 2nd edition of "iText in Action", >>>>>>> you know that PNGs are transformed into zipped pixels. >>>>>>> If you didn't know, you should read the book! >>>>>>> >>>>>>>> on here i try to take that image... >>>>>>>> >>>>>>>> [code] >>>>>>>> int XrefIndex =((PRIndirectReference)obj).getNumber(); >>>>>>>> PdfObject pdfObj = pdf.getPdfObject(XrefIndex); >>>>>>>> PdfStream pdfStrem = (PdfStream)pdfObj; >>>>>>>> byte[] bytes = >>>>>>>> PdfReader.getStreamBytesRaw((PRStream)pdfStrem); >>>>>>>> if ((bytes != null)) { >>>>>>>> String fileName = "Image_P"+pageNumber+"_"; >>>>>>>> File file = new File(fileName); >>>>>>>> FileOutputStream fw = new FileOutputStream(file); >>>>>>>> fw.write(bytes); >>>>>>>> fw.flush(); >>>>>>>> fw.close(); >>>>>>>> BufferedImage img2 = ImageIO.read(file); >>>>>>>> com.lowagie.text.Image img = >>>>>>>> com.lowagie.text.Image.getInstance(file.toURL()); >>>>>>>> } >>>>>>>> [/code] >>>>>>>> >>>>>>>> img2 returned a null !!!! >>>>>>> >>>>>>> Of course, why do you think that would work??? >>>>>>> >>>>>>>> in line of img .. has a Excpetion >>>>>>>> "Image_P1_ is not a recognized imageformat" >>>>>>> >>>>>>> Of course, you're sending iText a bunch of pixels, >>>>>>> but: what are the dimensions of the image, >>>>>>> how many bits are there per component? >>>>>>> >>>>>>>> when i try to do : >>>>>>>> [code] >>>>>>>> Image image = Toolkit.getDefaultToolkit().createImage(bytes); >>>>>>>> [code] >>>>>>>> >>>>>>>> and before create an image from this image getting the width and >>>>>>>> height >>>>>>>> from my PdfStream (create a buffered and draw the image) >>>>>>>> when i serialize on a file and visualize this.. this image in a >>>>>>>> fucking >>>>>>>> black picture .. all black -.- >>>>>>> >>>>>>> It's because you don't have a fucking clue about what you're doing >>>>>>> :P >>>>>>> Hehe, I was waiting for an occasion to use the F* word on the list. >>>>>>> Thanks! >>>>>>> >>>>>>>> if i use JPEG encode for my images.. all the 3 solution i have .. >>>>>>>> its >>>>>>>> ok.. have effects.. >>>>>>> >>>>>>> Well, that's because iText stores JPEGs literally as a JPEG without >>>>>>> changing any of the bytes. If you look inside, you'll see that the >>>>>>> filter is DCTDecode (Discrete Cosine Transform). >>>>>>> >>>>>>>> i can vizualize my images how to i create then .. perfect.. >>>>>>>> but if i change de JPEG ... for any other encode.. thats not have >>>>>>>> efect >>>>>>>> .. >>>>>>> >>>>>>> No idea what you're saying here, but you also need to study images. >>>>>>> >>>>>>>> can any help-me plz ? >>>>>>> >>>>>>> This example doesn't involve iText, but explains what you're >>>>>>> missing. >>>>>>> >>>>>>> Let's create an image byte per byte: >>>>>>> >>>>>>> byte b[] = new byte[256 * 3]; >>>>>>> for (int i = 0; i < 256; i++) { >>>>>>> b[i * 3] = (byte) (255 - i); >>>>>>> b[i * 3 + 1] = (byte) (255 - i); >>>>>>> b[i * 3 + 2] = (byte) i; >>>>>>> } >>>>>>> >>>>>>> This is how a PNG, GIF, and some other image types are stored >>>>>>> in a PDF, but in zipped format (FlateDecode). These bytes don't >>>>>>> make any sense if you don't know the bpc, color space and >>>>>>> dimensions. >>>>>>> >>>>>>> If you want to create an image from this bytes, you could do this: >>>>>>> >>>>>>> DataBuffer db = new DataBufferByte(b, b.length); >>>>>>> WritableRaster raster = Raster.createInterleavedRaster( >>>>>>> db, 16, 16, 48, 3, new int[]{0,1,2}, null); >>>>>>> ColorSpace cs = ColorSpace.getInstance(ColorSpace.CS_sRGB); >>>>>>> ColorModel cm = new ComponentColorModel( >>>>>>> cs, false, false, Transparency.OPAQUE, DataBuffer.TYPE_BYTE); >>>>>>> BufferedImage bi = new BufferedImage(cm, raster, false, null); >>>>>>> ImageIO.write(bi, "bmp", new File("hello.bmp")); >>>>>>> >>>>>>> In this example, I treat the image as 16 x 16 pixels, using RGB, >>>>>>> and converting it to a Bitmap. It's up to you to adapt the example >>>>>>> if your image is a GrayScale or CMYK image, or if you want another >>>>>>> format. >>>>>>> >>>>>>> (And please don't post the same question multiple times!!!) >>>>>>> -- >>>>>>> This answer is provided by 1T3XT BVBA >>>>>>> http://www.1t3xt.com/ - http://www.1t3xt.info >>>>>>> >>>>>>> ------------------------------------------------------------------------------ >>>>>>> Download Intel® Parallel Studio Eval >>>>>>> Try the new software tools for yourself. Speed compiling, find bugs >>>>>>> proactively, and fine-tune applications for parallel performance. >>>>>>> See why Intel Parallel Studio got high marks during beta. >>>>>>> http://p.sf.net/sfu/intel-sw-dev >>>>>>> _______________________________________________ >>>>>>> iText-questions mailing list >>>>>>> iText-questions@lists.sourceforge.net >>>>>>> https://lists.sourceforge.net/lists/listinfo/itext-questions >>>>>>> >>>>>>> Buy the iText book: http://www.1t3xt.com/docs/book.php >>>>>>> Check the site with examples before you ask questions: >>>>>>> http://www.1t3xt.info/examples/ >>>>>>> You can also search the keywords list: >>>>>>> http://1t3xt.info/tutorials/keywords/ >>>>>>> >>>>>>> >>>>>> >>>>>> >>>>>> >>>>>> >>>>> >>>>> >>>> >>>> -- >>>> View this message in context: >>>> http://old.nabble.com/Using-Images-extracted-from-a-pdf-tp27693711p27708516.html >>>> Sent from the iText - General mailing list archive at Nabble.com. >>>> >>>> >>>> ------------------------------------------------------------------------------ >>>> Download Intel® Parallel Studio Eval >>>> Try the new software tools for yourself. Speed compiling, find bugs >>>> proactively, and fine-tune applications for parallel performance. >>>> See why Intel Parallel Studio got high marks during beta. >>>> http://p.sf.net/sfu/intel-sw-dev >>>> _______________________________________________ >>>> iText-questions mailing list >>>> iText-questions@lists.sourceforge.net >>>> https://lists.sourceforge.net/lists/listinfo/itext-questions >>>> >>>> Buy the iText book: http://www.1t3xt.com/docs/book.php >>>> Check the site with examples before you ask questions: >>>> http://www.1t3xt.info/examples/ >>>> You can also search the keywords list: >>>> http://1t3xt.info/tutorials/keywords/ >>>> >>>> ------------------------------------------------------------------------------ >>>> Download Intel® Parallel Studio Eval >>>> Try the new software tools for yourself. Speed compiling, find bugs >>>> proactively, and fine-tune applications for parallel performance. >>>> See why Intel Parallel Studio got high marks during beta. >>>> http://p.sf.net/sfu/intel-sw-dev >>>> _______________________________________________ >>>> iText-questions mailing list >>>> iText-questions@lists.sourceforge.net >>>> https://lists.sourceforge.net/lists/listinfo/itext-questions >>>> >>>> Buy the iText book: http://www.1t3xt.com/docs/book.php >>>> Check the site with examples before you ask questions: >>>> http://www.1t3xt.info/examples/ >>>> You can also search the keywords list: >>>> http://1t3xt.info/tutorials/keywords/ >>>> >>>> >>> >>> -- >>> View this message in context: >>> http://old.nabble.com/Using-Images-extracted-from-a-pdf-tp27693711p27709815.html >>> Sent from the iText - General mailing list archive at Nabble.com. >>> >>> >>> ------------------------------------------------------------------------------ >>> Download Intel® Parallel Studio Eval >>> Try the new software tools for yourself. Speed compiling, find bugs >>> proactively, and fine-tune applications for parallel performance. >>> See why Intel Parallel Studio got high marks during beta. >>> http://p.sf.net/sfu/intel-sw-dev >>> _______________________________________________ >>> iText-questions mailing list >>> iText-questions@lists.sourceforge.net >>> https://lists.sourceforge.net/lists/listinfo/itext-questions >>> >>> Buy the iText book: http://www.1t3xt.com/docs/book.php >>> Check the site with examples before you ask questions: >>> http://www.1t3xt.info/examples/ >>> You can also search the keywords list: >>> http://1t3xt.info/tutorials/keywords/ >>> >>> ------------------------------------------------------------------------------ >>> Download Intel® Parallel Studio Eval >>> Try the new software tools for yourself. Speed compiling, find bugs >>> proactively, and fine-tune applications for parallel performance. >>> See why Intel Parallel Studio got high marks during beta. >>> http://p.sf.net/sfu/intel-sw-dev >>> _______________________________________________ >>> iText-questions mailing list >>> iText-questions@lists.sourceforge.net >>> https://lists.sourceforge.net/lists/listinfo/itext-questions >>> >>> Buy the iText book: http://www.1t3xt.com/docs/book.php >>> Check the site with examples before you ask questions: >>> http://www.1t3xt.info/examples/ >>> You can also search the keywords list: >>> http://1t3xt.info/tutorials/keywords/ >>> >>> >> >> -- >> View this message in context: >> http://old.nabble.com/Using-Images-extracted-from-a-pdf-tp27693711p27710159.html >> Sent from the iText - General mailing list archive at Nabble.com. >> >> >> ------------------------------------------------------------------------------ >> Download Intel® Parallel Studio Eval >> Try the new software tools for yourself. Speed compiling, find bugs >> proactively, and fine-tune applications for parallel performance. >> See why Intel Parallel Studio got high marks during beta. >> http://p.sf.net/sfu/intel-sw-dev >> _______________________________________________ >> iText-questions mailing list >> iText-questions@lists.sourceforge.net >> https://lists.sourceforge.net/lists/listinfo/itext-questions >> >> Buy the iText book: http://www.1t3xt.com/docs/book.php >> Check the site with examples before you ask questions: >> http://www.1t3xt.info/examples/ >> You can also search the keywords list: >> http://1t3xt.info/tutorials/keywords/ > > _________________________________________________________________ > Hotmail: Powerful Free email with security by Microsoft. > http://clk.atdmt.com/GBL/go/201469230/direct/01/ > ------------------------------------------------------------------------------ > Download Intel® Parallel Studio Eval > Try the new software tools for yourself. Speed compiling, find bugs > proactively, and fine-tune applications for parallel performance. > See why Intel Parallel Studio got high marks during beta. > http://p.sf.net/sfu/intel-sw-dev > _______________________________________________ > iText-questions mailing list > iText-questions@lists.sourceforge.net > https://lists.sourceforge.net/lists/listinfo/itext-questions > > Buy the iText book: http://www.1t3xt.com/docs/book.php > Check the site with examples before you ask questions: > http://www.1t3xt.info/examples/ > You can also search the keywords list: > http://1t3xt.info/tutorials/keywords/ > > -- View this message in context: http://old.nabble.com/Using-Images-extracted-from-a-pdf-tp27693711p27710517.html Sent from the iText - General mailing list archive at Nabble.com. ------------------------------------------------------------------------------ Download Intel® Parallel Studio Eval Try the new software tools for yourself. Speed compiling, find bugs proactively, and fine-tune applications for parallel performance. See why Intel Parallel Studio got high marks during beta. http://p.sf.net/sfu/intel-sw-dev _______________________________________________ iText-questions mailing list iText-questions@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/itext-questions Buy the iText book: http://www.1t3xt.com/docs/book.php Check the site with examples before you ask questions: http://www.1t3xt.info/examples/ You can also search the keywords list: http://1t3xt.info/tutorials/keywords/