> Any chance you could have a go at converting some of the parsing logic
Wow - I'm good but not that good. It's a pretty big project. I don't want to move too slowly, having done almost nothing on edbrowse in the past 6 months, but I don't want to run recklessly fast either. Need to pass designs by you guys before coding etc. And there are still some big questions to answer, like is tidy5 the right path, or perhaps libhubbub, which could be part of a larger browser effort, larger than just parsing html. netsurf-browser.org I'm running another sanity check on tidy. This generates an error because & is not escaped, and yes it probably should be. <A href="www.x.com/path?a=b&ccopy=d> It even converts © into the copyright symbol, now part of the url. So ok, maybe I did a bad test because I'm not following spec but the internet doesn't follow spec either, not all the time. Look at the raw html from www.sciam.com It contains these two lines, on the same home page. <li><a href="https://www.scientificamerican.com/store/subscribe/scientific-american-all-access/?WT.mc_id=SA_Webstore_SCA_AllAccess_SubCenter&responseKey=W3S03RD00" target="_blank">Subscribe to All Access <span class="red">»</span></a></li> <li><a href="https://w1.buysub.com/servlet/OrdersGateway?cds_mag_code=SCA&cds_page_id=185258&cds_response_key=I5S03R00B" target="_blank">Subscribe to Print <span class="red">»</span></a></li> The first one has & escaped, the second one does not. So ok just wanted to make sure tidy is handling these two cases properly, and it is. Happily, my parser also handles these cases properly. I must have run into this at some point. I'll continue testing. Assuming I uncover no serious problems, I think the next step is to enhance our edbrowse node, with enough attributes to faithfully copy the information from a tidy node. We have some of the attributs but not enough. A blatent omission is a text string, because we never represented text nodes before. We'll need this, and child pointers, and a list of attribute value pairs, and other things. I'll post more on this later. Karl Dahlke
_______________________________________________ Edbrowse-dev mailing list [email protected] http://lists.the-brannons.com/mailman/listinfo/edbrowse-dev
