I use Mojo::DOM for various web scraping and analysis, very easy, very 
fast, nice.

Usually I am interested in only a few tags, not the entire dom. So I use 
->find() to select the interesting nodes, check some facts on the found 
nodes and store the results in a database for later viewing.

For this later viewing I would love to retain the sequence in which the 
nodes are in the source. Unfortunately all information about the sequence 
of tags is lost when I use ->find(). 

The parser I used to use before (HMTL::HTML5::Parser) does provide a 
line-number function for each element. This is enough for me to retain the 
sequence of nodes, the absolute position is not important.

Do you think it would be possible to extend Mojo::DOM to provide a line 
number for each element? I understand this this might be insufficient for 
the situation where many tags are on the same line, but that's too bad 
then... 

TIA,
Ekki




-- 
You received this message because you are subscribed to the Google Groups 
"Mojolicious" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to mojolicious+unsubscr...@googlegroups.com.
To post to this group, send email to mojolicious@googlegroups.com.
Visit this group at https://groups.google.com/group/mojolicious.
For more options, visit https://groups.google.com/d/optout.

Reply via email to