There are quite a few different "draw this text" operators. TJ and Tj are two of... lets see... four. ' "
(blah) ' is equivalent to T* (blah) Tj 1 2 (blah) " Is equivalent to 1 Tw 2 Tc T* (blah) Tj Tw sets the word spacing Tc sets the character spacing T* advances to the next line based on the current leading (set by TL) Tj and TJ cover ninety-something percent of the cases. In fact, I don't know that I've ever seen a ' or " In The Wild. None the less, if you want to be thorough, include them. --Mark Storer Senior Software Engineer Cardiff.com import legalese.Disclaimer; Disclaimer<Cardiff> DisCard = null; > -----Original Message----- > From: sal salaimani [mailto:[email protected]] > Sent: Tuesday, June 01, 2010 3:41 PM > To: [email protected] > Subject: [iText-questions] Parsing marked and unmarked content > > > I am parsing marked and unmarked content using PDfcontentParser and > PRtokeniser classes of iText API. Here are my algorithms. > > logic 1 > Getting Marked Content > 1. Look for dictionary starting point > 2. if next token "MCID" and loop thru until I find "EMC" operator > 3. Inside the loop I keep concatinating string until I hit "TJ" or > "Tj" operator and store them in an arrarlist > logic 2 > Getting All text > 1. loop thru unitl end of file > 2. Inside the loop I keep concatinating string until I hit "TJ" or > "Tj" operator and store them in an arrarlist > logic 3 > Getting Unmarked content > 1. I find difference between logic 1 and logic 2 arraylist and > store > the result. > > These algorithms works for me. Can anyone suggest me that this algorithm > works all test cases? > > Also I am attaching the code snippet of the algorithms. > > logic 1 > while(tokenizer.nextToken()){ > if (tokenizer.getTokenType() == PRTokeniser.TK_NAME > && > tokenizer.getStringValue().equals("Artifact")){ > > skip_artifact_flag = true; > continue; > > } > if(tokenizer.getTokenType() == > PRTokeniser.TK_START_DIC){ > > > > tokenizer.nextToken(); > > if ( > tokenizer.getStringValue().equals("MCID") ){ > skip_artifact_flag = false; > tokenizer.nextToken(); > > mcid_i = tokenizer.intValue(); > > > //need to have loop until EMC or > while(tokenizer.nextToken()){ > if(tokenizer.getTokenType() == > PRTokeniser.TK_OTHER && > tokenizer.getStringValue().equals("EMC")){ > > mcid_i = -1; > break; > > > } > if(tokenizer.getTokenType() == > PRTokeniser.TK_STRING && > skip_artifact_flag == false) > value = value + > tokenizer.getStringValue(); > > if (tokenizer.getTokenType() == > PRTokeniser.TK_OTHER && > (tokenizer.getStringValue().equals("TJ") || > tokenizer.getStringValue().equals("Tj"))){ > > if(!value.trim().equals("")){ > //mcidMap.put(new > Integer(mcid_i).toString(),value); > > TxtcontentMarked.add(value); > } > value = ""; > } > > } > > } > > logic 2 > > while (tokenizer.nextToken() ){ > // if() > > if (tokenizer.getTokenType() == > PRTokeniser.TK_OTHER && > (tokenizer.getStringValue().equals("TJ") || > tokenizer.getStringValue().equals("Tj"))){ > > if(!value.trim().equals("")) > Txtcontent.add(value); > value = ""; > > //break; > > } > if (tokenizer.getTokenType() == > PRTokeniser.TK_STRING) value = value > + tokenizer.getStringValue(); > // System.out.println("va ="+ value); > > } > > logic 3 > > // Iterator iterator = mcidMap.keySet().iterator(); > int arrayListSize = Txtcontent.size(); > // TxtNotMarked = Txtcontent; > int arrayListSize0 = TxtcontentMarked.size(); > > for(int k = 0; k < arrayListSize0; k++) { > //{ > // while (iterator.hasNext()) { > // String key = iterator.next().toString(); > // String value_h = mcidMap.get(key).toString(); > > // System.out.println("[ "+key + " ] " + "[[[------]]]" + > value_h); > > for(int i = 0; i < arrayListSize; i++) > { > > //System.out.println("Content = > "+Txtcontent.get(i)); > if > (TxtcontentMarked.get(k).trim().equals(Txtcontent.get(i).trim())){ > //TxtNotMarked.add(Txtcontent.get(i)); > //TxtNotMarked.remove(i); > Txtcontent.remove(i); > arrayListSize = Txtcontent.size(); > } > > } > > } > > Sal Salaimani > -- > View this message in context: http://itext- > general.2136553.n4.nabble.com/Parsing-marked-and-unmarked-content- > tp2239347p2239347.html > Sent from the iText - General mailing list archive at Nabble.com. > > ------------------------------------------------------------------------ -- > ---- > > _______________________________________________ > iText-questions mailing list > [email protected] > https://lists.sourceforge.net/lists/listinfo/itext-questions > > Buy the iText book: http://www.itextpdf.com/book/ > Check the site with examples before you ask questions: > http://www.1t3xt.info/examples/ > You can also search the keywords list: > http://1t3xt.info/tutorials/keywords/ > > > No virus found in this incoming message. > Checked by AVG - www.avg.com > Version: 9.0.819 / Virus Database: 271.1.1/2910 - Release Date: 06/01/10 > 11:25:00 ------------------------------------------------------------------------------ _______________________________________________ iText-questions mailing list [email protected] https://lists.sourceforge.net/lists/listinfo/itext-questions Buy the iText book: http://www.itextpdf.com/book/ Check the site with examples before you ask questions: http://www.1t3xt.info/examples/ You can also search the keywords list: http://1t3xt.info/tutorials/keywords/
