Re: [iText-questions] page number determination

newoutlook Fri, 22 Jan 2010 23:47:00 -0800

You are right I am parsing StructTree of PDF document.  First, I get indirect
reference to a page object I am looking from a tag or structure
element(table, link, and list etc...)  .  Then I compare page original
reference object's number   for each page with number of indirect reference
of the page of the tag.   If the number matches, then I get page number by
knowing where I am at the comparision.


I get children from a tag or structure element which does not have page
object and following above steps to get page number.

There are cases the tag or structure element and children of the tag or
sturucture element do not have page object. so I could not determine page
number for these cases.


Here are code segments I use to get page numbers
Method 1
/**
         * Retrieve the page number of the page referenced by an indirect
reference.
         * @param pageRef An indirect reference to the page to lookup.
         * @return -1 if page is not found.
         */
        public int getPageNumber(PdfIndirectReference pageRef) {
                if (pageRef == null) return -1;
                for (int i=0; i<reader.getNumberOfPages(); ++i) {
                        if (reader.getPageOrigRef(i+1).getNumber() == 
pageRef.getNumber())
                                return i+1;
                        //if (reader.getPageOrigRef(i+1).getNumber() == 
pageRef.getNumber())
                        //      return i;
                }
                return -1;
        }

Method 2
      public int getPageNumber() {
                //System.out.println("inside page method = " + pageNumber);
                if (pageNumber != -2)
                        return pageNumber;

                pageNumber = -1;
                //try {
                    //System.out.println("inside page method before = " +
getITxtDictionary().contains(PdfName.PG));
                    if (getITxtDictionary().contains(PdfName.PG)) {
                           // Try the page reference directly  ----method 1 
call here.......
                           pageNumber = getDocument().getPageNumber(
                                        
getITxtDictionary().getAsIndirectObject(PdfName.PG));
                    }
        //      }  catch (Exception exe) {
                        //System.out.println("There is no page number for 
parent");
     //   }
                if (pageNumber == -1) {
                        // No valid direct reference, check for a valid value 
from the
                        // children
                        int firstPage = Integer.MAX_VALUE;
                        ValPdfNode[] children = getChildNodes();
                        //if (children.length == 0) firstPage = 1; 
                        for (ValPdfNode child : children) {
                                
                                //if (null == child) continue;  // there is no 
child
                                
                                int childPage = child.getPageNumber();
                                //System.out.println("childpage ="+ childPage);
                                if (childPage != -1)
                                        firstPage = Math.min(firstPage, 
childPage);
                                //else if (childPage == -1)
                                //      firstPage = 1;
                        }

                        if (firstPage == Integer.MAX_VALUE)
                                firstPage = -1;
                        
                        
                        pageNumber = firstPage;
                        

                }

                return pageNumber;
        }

 


1T3XT info wrote:
> 
> newoutlook wrote:
>> The intent here is to validate pdf doc for more readability like
>> validating
>> table structure in pdf contents. 
> 
> You're talking about Marked Content / Tagged PDF, aren't you?
> Then why don't you say so?
> 
>> I am determining a page number for pdf
>> contents.
> 
> How did you get the content stream?
> Normally you should get it using the page number.
> 
>> I parse pdf contents as a tree node like table,list, and link
>> etc...
> 
> You parse the StructTree? Not the content stream?
> The way you phrase a question is very labyrinthic.
> 
>>For example, I get indirect reference for node (table) using this
>> statement PdfIndirectReference pageRef  =
>> getITxtDictionary().getAsIndirectObject(PdfName.PG);  and use the
>> following
>> for loop like simplebookmark example to get page number. Please let me
>> know,
>> if this works for me.
> 
> I don't understand the problem.
> Seems like you're doing something weird.
> -- 
> This answer is provided by 1T3XT BVBA
> http://www.1t3xt.com/ - http://www.1t3xt.info
> 
> ------------------------------------------------------------------------------
> Throughout its 18-year history, RSA Conference consistently attracts the
> world's best and brightest in the field, creating opportunities for
> Conference
> attendees to learn about information security's most important issues
> through
> interactions with peers, luminaries and emerging and established
> companies.
> http://p.sf.net/sfu/rsaconf-dev2dev
> _______________________________________________
> iText-questions mailing list
> [email protected]
> https://lists.sourceforge.net/lists/listinfo/itext-questions
> 
> Buy the iText book: http://www.1t3xt.com/docs/book.php
> Check the site with examples before you ask questions:
> http://www.1t3xt.info/examples/
> You can also search the keywords list:
> http://1t3xt.info/tutorials/keywords/
> 
> 

-- 
View this message in context: 
http://old.nabble.com/page-number-determination-tp27218494p27281722.html
Sent from the iText - General mailing list archive at Nabble.com.


------------------------------------------------------------------------------
Throughout its 18-year history, RSA Conference consistently attracts the
world's best and brightest in the field, creating opportunities for Conference
attendees to learn about information security's most important issues through
interactions with peers, luminaries and emerging and established companies.
http://p.sf.net/sfu/rsaconf-dev2dev
_______________________________________________
iText-questions mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/itext-questions

Buy the iText book: http://www.1t3xt.com/docs/book.php
Check the site with examples before you ask questions: 
http://www.1t3xt.info/examples/
You can also search the keywords list: http://1t3xt.info/tutorials/keywords/

Re: [iText-questions] page number determination

Reply via email to