I discovered that what I really wanted was not ignore_ignorable

but for HTML::Element as_text to leave a space between child content segments and not do this
if no space was at the end of the last child text bit.

Current Behavior is:
given following psuedo_html

<node>
<h2>Joe PerlCamel role model for kids</h2>
<div> Hi, my name is <a href="/blah.html">Joe PerlCamel</a> and I'm a good role model for kids</div>
</node>

my $string = $node->as_text();
print qq{$string\n};

gives: Joe PerlCamel role model for kidsHi, my name is Joe PerlCamel and I'm a good role model for kids.


I would like to submit a patch to HTML:Element

proposed method name is:

as_text_w_space


it simply looks like this:


sub as_text_w_space {
   # Yet another iteratively implemented traverser
   my($this,%options) = @_;
   my $skip_dels = $options{'skip_dels'} || 0;
   #print "Skip dels: $skip_dels\n";
   my(@pile) = ($this);
   my $tag;
   my $text = '';
   while(@pile) {
       if(!defined($pile[0])) { # undef!
           # no-op
       } elsif(!ref($pile[0])) { # text bit!  save it!
           my $val = shift @pile;
           #add a space after each text bit unless already there
            unless ($val =~ /\s$/){ $val .= " ";}
           $text .= $val;
       } else { # it's a ref -- traverse under it
           unshift @pile, @{$this->{'_content'} || $nillio}
           unless
               ($tag = ($this = shift @pile)->{'_tag'}) eq 'style'
               or $tag eq 'script'
               or ($skip_dels and $tag eq 'del');
       }
   }
   return $text;
}


Let me know what you think.

Is Sean around?

Cheers!



deborah sciales wrote:

Hello,

I'm using TreeBuilder and am finding it useful.

I have a few questions.

one is if I turn off ingorable_whitespace as such, i get errors when using element methods.

Here is an example:

  sub get_content {
   my $string = shift;
   my $tree = HTML::TreeBuilder->new; # empty tree
   $tree->no_space_compacting(1);
   $tree->ignore_ignorable_whitespace(0);
   $tree->parse($string);
   $tree->eof;
   #$tree->elementify;
   my $content = '';
   $tree = delete_unwanted_nodes($tree);
   my $node = $tree->find_by_tag_name('body');
   #$node = $node->nativize_pre_newlines();
   my @nodes = $node->content_list();
   foreach my $node (@nodes){
   my $cont = $node->as_text(skip_dels => 1);
     if ($cont){
       $content .= $cont;
   }
 }
   $tree = $tree->delete;
   return $content;
}

i get the error: Can't call method "as_text" without a package or object reference at ./test.pl line 152.

which of course goes away if i comment out the ignore_ignorable line.

Also the method nativize_pre_newlines is not implemented, though it is in the docs of HTML::Element. I've written my own simple nativizer. Just wanted to point that out. And I've also written my own as_text_with_newlines, to get around this, but wanted to comment on it.

Thanks for a great set of modules to Gisle and Sean!




Reply via email to