All,

I've managed to sort this out. If you're interested, I changed from a recursive algorithm to iteration and my problem is solved as follows:

#Gets the *next* text token in tree
#Nb. Using POE
sub traverse_tree {

   my @pile = @{$_[HEAP]{PILE}};  #retrieve pile - dereference array
                                  #pile is a queue of element awaiting
                                  #processing
   my $stack = $_[HEAP]{SEEN};    #seen queue - elements already done
   my $output = '';

   #Quit on End of Tree OR when a text token is found
   while ( @pile and $output eq '' ) {

       #Text
         if ( !ref $pile[0] ){
             $output .= shift @pile;
         }
       #An HTML element not previously encountered
         elsif ( !$stack->EXISTS($pile[0]) ) {
             #Remove this element from pile
             my $temp_obj = shift @pile;
             #Add children to pile
             unshift ( @pile, $temp_obj->content_list );
             #Place this element onto seen stack
             $stack->Push( $temp_obj=>1 );
         }
       #A previously encountered element - search for next element
         else {
           #remove the offending item
           my $temp_obj = shift @pile;
           #search depth first for 'next' unseen element
           until ( @pile ){
             #get parent
             $temp_obj = $temp_obj->parent();
             #add to head of pile if not previously seen
             unshift @pile, $temp_obj unless $stack->EXISTS($temp_obj);
           }
         }

}

    #Preserve pile
    $_[HEAP]{PILE} = [EMAIL PROTECTED];
    #Return first text
    return $output;

}









James Brown wrote:

Dear All,

I have some HTML stored in a tree structure (courtesy of HTML::Treebuilder) and now need to traverse this tree in pre-order.
Basically, I need to get the text-only content of the tree, starting from the specified node, but stop traversal when the first text is received (resuming from this node on next call).


The $node->as_text() method won't help me because it doesn't stop when it reaches the *first* text content - it simply dumps all the text below the specified node.

I have attempted some code, but unfortunately, it gets stuck in an infinite loop. It can traverse a single branch, but it won't navigate up the tree to try the next branch to the right:


#should return the next text-only content in the tree sub get_next_text{

  my $tree = shift;
  my ( @pile ) = ( $tree );   # HTML::TreeBuilder/Element obj array
                              # Where we start our traversal
  my $text= '';

while ( @pile and !$exit ) {

     #######################Debug#####################
     foreach ( @pile ){
        print $_, '=>', $_->tag(), "\n" if ref($_);
        print $_, "\n" if !ref($_);
     }
     print "\n**********\n";
     sleep 5;
     #################################################

     if ( !ref($pile[0]) ) { # Text only - store
         $text .= shift @pile;
         $exit = 1; # all done
     }
     else {                  # Children - enqueue them
         unshift @pile, @{$pile[0]->{'_content'}};
     }

  };
    print "\n$text\n";
    return $pile[0];
}

Don't feel obliged to help me, but if you get a spare minute, I'd definitely appreciate it ;-)

Thanks,

James.



_______________________________________________
ActivePerl mailing list
[EMAIL PROTECTED]
To unsubscribe: http://listserv.ActiveState.com/mailman/mysubs



_______________________________________________
ActivePerl mailing list
[EMAIL PROTECTED]
To unsubscribe: http://listserv.ActiveState.com/mailman/mysubs

Reply via email to