Am Freitag, 22. Juli 2016 20:51:06 UTC+2 schrieb Dan Book:
>
> You could try something like this...
>
> $dom->find('*')->map(sub { state $i = 0; $_->{_myapp_counter} = $i++ });
>
>
Nice idea, but wouldn't work.
A doc like
<h2>1</h2>
<p>bar</p>
<h2>2</h2>
<p>bar</p>
Would result in numbering
1: 1st h1
2: 2nd h2
3. 1st p
4. 2nd p
whereas the sequence actually is 1 3 2 4. And that's all I want: the
sequence the elements showed up in the original html source.
> Alternatively you could go through $dom->find('*') in order and test
> $_->tag or other methods to collect the tags in the order you want. This
> would only work if your criteria are simple enough that you don't really
> need the CSS selector to find them.
>
I thought about this, but this would render all use of selectors futile and
probably take much longer. I would have to loop through all elements, check
each if it is a match (out of maybe a dozen, with very simple selectors)
amd then work on that element. But I found this concept so inelegant that I
rather went back to HTML::HTML5::Parser which has a source_line() function.
Ok, the penalty is to deal with XML and Xpath and so on. Mojo:_:DOM is so
much easier to use in most other cases...
Regards,
Ekki
>
> On Fri, Jul 22, 2016 at 1:24 PM, Ekki Plicht <[email protected]
> <javascript:>> wrote:
>
>>
>> Am Freitag, 22. Juli 2016 16:28:07 UTC+2 schrieb Scott Wiersdorf:
>>>
>>> You can use map() to do that:
>>>
>>> $dom->find('div')->map(sub { state $i = 0; say $i++ . " $_" });
>>>
>>
>> Right, that would give me the proper sequence for all <div>s.
>> And then I would have another sequence for all <h1>s, and another for all
>> <td>s and another for all <p>s, and so on.
>>
>> What I need is one sequence which gives me the right order of all tags I
>> am looking at.
>>
>> Cheers,
>> Ekki
>>
>>
>>
>>
>>
>>
>>
>>
>>>
>>>
>>> Scott
>>>
>>> On Friday, July 22, 2016 at 1:44:45 AM UTC-6, Ekki Plicht wrote:
>>>>
>>>> I use Mojo::DOM for various web scraping and analysis, very easy, very
>>>> fast, nice.
>>>>
>>>> Usually I am interested in only a few tags, not the entire dom. So I
>>>> use ->find() to select the interesting nodes, check some facts on the
>>>> found
>>>> nodes and store the results in a database for later viewing.
>>>>
>>>> For this later viewing I would love to retain the sequence in which the
>>>> nodes are in the source. Unfortunately all information about the sequence
>>>> of tags is lost when I use ->find().
>>>>
>>>> The parser I used to use before (HMTL::HTML5::Parser) does provide a
>>>> line-number function for each element. This is enough for me to retain the
>>>> sequence of nodes, the absolute position is not important.
>>>>
>>>> Do you think it would be possible to extend Mojo::DOM to provide a line
>>>> number for each element? I understand this this might be insufficient for
>>>> the situation where many tags are on the same line, but that's too bad
>>>> then...
>>>>
>>>> TIA,
>>>> Ekki
>>>>
>>>>
>>>>
>>>>
>>>> --
>> You received this message because you are subscribed to the Google Groups
>> "Mojolicious" group.
>> To unsubscribe from this group and stop receiving emails from it, send an
>> email to [email protected] <javascript:>.
>> To post to this group, send email to [email protected]
>> <javascript:>.
>> Visit this group at https://groups.google.com/group/mojolicious.
>> For more options, visit https://groups.google.com/d/optout.
>>
>
>
--
You received this message because you are subscribed to the Google Groups
"Mojolicious" group.
To unsubscribe from this group and stop receiving emails from it, send an email
to [email protected].
To post to this group, send email to [email protected].
Visit this group at https://groups.google.com/group/mojolicious.
For more options, visit https://groups.google.com/d/optout.