You could try something like this...
$dom->find('*')->map(sub { state $i = 0; $_->{_myapp_counter} = $i++ });
It would add an attribute to every tag but that may or may not be a problem
for your application.
Alternatively you could go through $dom->find('*') in order and test
$_->tag or other methods to collect the tags in the order you want. This
would only work if your criteria are simple enough that you don't really
need the CSS selector to find them.
On Fri, Jul 22, 2016 at 1:24 PM, Ekki Plicht <[email protected]> wrote:
>
> Am Freitag, 22. Juli 2016 16:28:07 UTC+2 schrieb Scott Wiersdorf:
>>
>> You can use map() to do that:
>>
>> $dom->find('div')->map(sub { state $i = 0; say $i++ . " $_" });
>>
>
> Right, that would give me the proper sequence for all <div>s.
> And then I would have another sequence for all <h1>s, and another for all
> <td>s and another for all <p>s, and so on.
>
> What I need is one sequence which gives me the right order of all tags I
> am looking at.
>
> Cheers,
> Ekki
>
>
>
>
>
>
>
>
>>
>>
>> Scott
>>
>> On Friday, July 22, 2016 at 1:44:45 AM UTC-6, Ekki Plicht wrote:
>>>
>>> I use Mojo::DOM for various web scraping and analysis, very easy, very
>>> fast, nice.
>>>
>>> Usually I am interested in only a few tags, not the entire dom. So I use
>>> ->find() to select the interesting nodes, check some facts on the found
>>> nodes and store the results in a database for later viewing.
>>>
>>> For this later viewing I would love to retain the sequence in which the
>>> nodes are in the source. Unfortunately all information about the sequence
>>> of tags is lost when I use ->find().
>>>
>>> The parser I used to use before (HMTL::HTML5::Parser) does provide a
>>> line-number function for each element. This is enough for me to retain the
>>> sequence of nodes, the absolute position is not important.
>>>
>>> Do you think it would be possible to extend Mojo::DOM to provide a line
>>> number for each element? I understand this this might be insufficient for
>>> the situation where many tags are on the same line, but that's too bad
>>> then...
>>>
>>> TIA,
>>> Ekki
>>>
>>>
>>>
>>>
>>> --
> You received this message because you are subscribed to the Google Groups
> "Mojolicious" group.
> To unsubscribe from this group and stop receiving emails from it, send an
> email to [email protected].
> To post to this group, send email to [email protected].
> Visit this group at https://groups.google.com/group/mojolicious.
> For more options, visit https://groups.google.com/d/optout.
>
--
You received this message because you are subscribed to the Google Groups
"Mojolicious" group.
To unsubscribe from this group and stop receiving emails from it, send an email
to [email protected].
To post to this group, send email to [email protected].
Visit this group at https://groups.google.com/group/mojolicious.
For more options, visit https://groups.google.com/d/optout.