Re: [uf-discuss] Parsing XFN in PHP

Julian Bond Thu, 10 Apr 2008 10:15:09 -0700

Ryan Parman <[EMAIL PROTECTED]> Thu, 10 Apr 2008 09:05:47

As someone with a background in parsing RSS/Atom, I can say from yearsof experience that RSS is only occasionally XML and that you typicallyfind far more HTML in a feed than XML. And parsing HTML can be a bitch.


Big snip.

Woah! That's enough to put one off even starting on parsing and readinguF. Which makes uF all a bit pointless. Oh dear. :(

I suspect though that this Gordian knot can be cut. It seems quitelikely that any page marked up with uF is good enough that HTML-Tidywon't remove too many uF marked up elements. If that's the case, thenFetch html -> HTML-Tidy -> XML parsing is going to get 99% of the jobdone and successfully extract the uF marked data. But that HTML-Tidystep is going to be indispensable. It just plain won't work without it.And the shortcut that reduces even that step isDomDocument>loadHtml($html) which is effectively doing the same thing.

It would be interesting to do some interop testing and see just how bada web page has to be before the uF starts getting missed.


And a uF validator would come in handy there.

--
Julian Bond  E&MSN: julian_bond at voidstar.com  M: +44 (0)77 5907 2173
Webmaster:          http://www.ecademy.com/      T: +44 (0)192 0412 433
Personal WebLog:    http://www.voidstar.com/     skype:julian.bond?chat
                           Tastes Like Milk
_______________________________________________
microformats-discuss mailing list
microformats-discuss@microformats.org
http://microformats.org/mailman/listinfo/microformats-discuss

Re: [uf-discuss] Parsing XFN in PHP

Reply via email to