Exactly the example I needed, thank you Jeffrey :-) I think it deserves being shared.
I'm very busy in the next 2 weeks, but after that I think I will read the whole Marpa::R2::HTML doc in detail, and try and play a bit with it. And maybe also try and implement the same things with Mojo::UserAgent, to get a feel of how they compare. If I understand well... 1. First, all the <a> tags are parsed and get replaced by [ <a> http://some-link.com <http://the-link.com></a>, 0 ]. 2. Then the <table> handler updates the is_in_table boolean of all contained links 3. Finally at the TOP, the results are pushed into a list. Step 2 works because M::R2::HTML::values() only returns elements who had actions run on them previously; right? (it won't return paragraphs, for instance - but if I added a paragraph handler, then that would interfere - I did some tests to check this). I could not understand from the documentation (but I didn't read it thoroughly yet), what happens to the array returned by each handler? I get access to these returned arrays with M::R::HTML::values() later. Is that all I need to know about it? Maybe that should be described in the section called THE Marpa::R2::HTML::html STATIC METHOD Here is the state of my impressions now: - it makes it trivial to deal with nested structures, nice! - it's very good at dealing with broken HTML - it starts actions at the deepest levels of the tree, and then "comes back up", instead of "going down" like with most other scrapers. So the code will look very different. I'm not sure how I feel about that yet. Excited by learning something new, and wondering whether the resulting code will be easier or harder to understand, and maintain. - classical scrapers seem to make fine grained control "more intuitive", but that's probably because I'm not used to this new way of scraping. Did I miss any additional benefit of scraping with Marpa::R2::HTML? I'll feedback when I've made more experiemnts. Please criticize my feedback, so I can know if it's useful or bothering; do I ask too many questions; do I write too much? -- You received this message because you are subscribed to the Google Groups "marpa parser" group. To unsubscribe from this group and stop receiving emails from it, send an email to [email protected]. For more options, visit https://groups.google.com/groups/opt_out.
