Hi Nick!
> I'm not sure where and how you're manipulating the DOM but I'd also
> be curious as to how it works with potentially horribly XML unfriendly
> content eg something that has been posted that originated in Microsoft
> Word for example. I just remember in some of the PHP4
> XML based templating engines I played with that they had a tendency
> to choke on the kind of real world content that users put in.
Yes, I was also thinking of Word and the likes when implementing the DOM
based approach :/
Initially, I used regex to find all ahrefs and formactions for link
replacement. Unfortunately, I'm no mr. regex so that turned out to be
quite difficult for me. On the other hand, I was fearing that regex might
just solve another part of the problem, working e.g. for valid and
malformed documents but not for all cases that ahref links/ formactions
might look like.
The current code basically looks like this:
$responseDoc = new DOMDocument();
$responseDoc->loadHtml($response);
// process the form action links
$formTags = $responseDoc->getElementsByTagName("form");
foreach ($formTags as $formTag)
{
if ($formTag->hasAttribute("action"))
{
$action = $formTag->getAttribute("action");
$newAction = $this->_postProcessUrl($action,
$previousPortletactionParam);
$formTag->setAttribute("action", $newAction);
}
}
which was really easy to implement. Do you see a chance to improve the
parsing part?
regs,
Stephan