Re: [uf-discuss] Re: Parsing XFN in PHP

Ryan Parman Thu, 10 Apr 2008 11:42:50 -0700

On Apr 10, 2008, at 10:04 AM, Julian Bond wrote:

Ryan Parman <[EMAIL PROTECTED]> Thu, 10 Apr 2008 09:05:47
As someone with a background in parsing RSS/Atom, I can say fromyears of experience that RSS is only occasionally XML and that youtypically find far more HTML in a feed than XML. And parsing HTMLcan be a bitch.
Big snip.
Woah! That's enough to put one off even starting on parsing andreading uF. Which makes uF all a bit pointless. Oh dear. :(


Sarcasm noted. ;)

I suspect though that this Gordian knot can be cut. It seems quitelikely that any page marked up with uF is good enough that HTML-Tidywon't remove too many uF marked up elements. If that's the case,then Fetch html -> HTML-Tidy -> XML parsing is going to get 99% ofthe job done and successfully extract the uF marked data. But thatHTML-Tidy step is going to be indispensable. It just plain won'twork without it. And the shortcut that reduces even that step isDomDocument>loadHtml($html) which is effectively doing the same thing.


On Apr 10, 2008, at 10:34 AM, Toby A Inkster wrote:

http://www.php.net/manual/en/function.dom-domdocument-loadhtml.php

This is interesting -- especially if it works. However the versioninformation is noted as CVS-only. Is this in a shipping version of PHPyet?

Using HTML-Tidy is a fairly big gotcha for most people on sharedhosting. I don't know the stats, but I would guess that not manyhosting providers have this installed. I have access to dedicatedhardware, so I'm definitely interested in this (assuming it works asexpected, of course), but I'm concerned about the community at-large.


On Apr 10, 2008, at 10:04 AM, Julian Bond wrote:

It would be interesting to do some interop testing and see just howbad a web page has to be before the uF starts getting missed.



I agree.

--
Ryan Parman
<http://ryanparman.com>



_______________________________________________
microformats-discuss mailing list
microformats-discuss@microformats.org
http://microformats.org/mailman/listinfo/microformats-discuss

Re: [uf-discuss] Re: Parsing XFN in PHP

Reply via email to