Chris Hastie
Sat, 13 Oct 2007 04:45:55 -0700
I've attached some resource files for producing XML output. They are specifically intended to be be pulled into PHP using XML_Unserializer with minimum options. An example of this in action can be seen at <http://www.tree-care.info/uktc/archive> There are two main files. xml-nested.mrc was my starting point. This was eventually abandoned so is less well developed than xml.mrc, but is included for information. The difference between the two is in how threads are represented, which also affects the threadslices in messages. xml-nested produces xml with a <list> element which contains <message> elements. Each message element represents the top of a thread. A <message> element may contain a <followups> element, which itself contains further <message> elements. A couple of issues with the theory on the version of MhonArc I originally tested this on (2.6.8): Threadslices don't seem to flatten properly. Instead of closing a <message> tag and opening another, a new <message> tag is opened inside the old, and then all closed at the end. Because it's flattened, the <followups> container is missed. I couldn't get rid of a </followups><followups> output between 'proper' threaded messages and subject threaded ones. Consequently a message with 'possible followups' ends up with two <followups> elements. This structure can be parsed in PHP thus: // Instantiate the serializer $Unserializer = &new XML_Unserializer(); // Serialize the data structure $status = $Unserializer->unserialize($doc, TRUE); $data = $Unserializer->getUnserializedData(); $list = $data->list; echo "<ul>\n"; foreach ($list as $msgobj) { parsemsgobj ($msgobj); } echo "</ul>\n"; ########################################### function parsemsgobj ($msgobj) { if ($msgobj->type == 'empty') { echo "<li style=\"list-style-type: none\">\n"; } else { echo "<li>\n" . $msgobj->subject . ' (' . $msgobj->fromname . ")\n"; } if (is_array($msgobj->followups)) { foreach ($msgobj->followups as $thisobj) { if (is_object($thisobj)) { echo "<ul>\n"; parsemsgobj ($thisobj); echo "</ul>\n"; } elseif (is_array($thisobj)) { foreach ($thisobj as $thisobj2) { if (is_object($thisobj2)) { echo "<ul>\n"; parsemsgobj ($thisobj2); echo "</ul>\n"; } } } } } echo "</li>\n"; } The reason for abandoning this approach was whilst it can be parsed in PHP, it is less easy to just chuck the data structure at a smarty template and get that to do the work. My second attempt creates a flat list of all messages in thread order. Information on thread depth is included. The resulting data structure, when pulled into PHP, can be parsed in smarty without too much difficulty. #### PHP <?php // Instantiate the serializer $Unserializer = &new XML_Unserializer(); // Serialize the data structure $status = $Unserializer->unserialize($doc, TRUE); $data = $Unserializer->getUnserializedData(); $msglist = $data->list; $template = new template('mhonarcxmlmodule','_viewindex',$loc); $template->assign('msglist',$msglist); ?> #### Smarty {assign var=lastDepth value=0} {assign var=maxdepth value=4} {assign var=start value=1} <div id="mh_threadlist"> <ul> {foreach from=$msglist item=msg} {if $msg->depth < $maxtdepth} {assign var=currentdepth value=$msg->depth} {else} {assign var=currentdepth value=$maxtdepth} {/if} {if $start==1} {assign var=start value=0} {if $currentdepth > 0} <li class="threadstartindent" > <span class="continued">{$msg->tsubject} continued</span><ul> {section name=foo loop=$currentdepth-1} <li class="threadstartindent" ><ul> {/section} {/if} {else} {if $currentdepth == $lastDepth} </li> {elseif $currentdepth > $lastDepth} <ul> {else} {section name=guff loop=$lastDepth-$currentdepth} </li></ul> {/section} </li> {/if} {/if} {if $msg->current == 'Yes'} <li><span class="urhere">{$msg->subject}</span><br /> <span class="msgauthor">{$msg->fromname}</span> {else} <li><a href="blah" >{$msg->subject}</a><br /> <span class="msgauthor">{$msg->fromname}</span> {/if} {assign var=lastDepth value=$currentdepth} {/foreach} {section name=guff2 loop=$lastDepth}</li></ul>{/section} </li></ul> </div> There are some issues to iron out still. I've got problems with control characters turning up in the XML, which I've tackled with processing in the PHP app before trying to parse the XML. I've converted an archive of around 43,000 pages to this and indexing it has thrown up 56 pages where the XML fails to parser. A quick initial look suggests a significant proportion of these involve messages which contain attached messages, so I guess this is something to look at. By the way, the reason for the <ATTACHMENTURL> %ATTACHMENTURLBASE% </ATTACHMENTURL> is to allow a str_replace() to be used to set the attachment url. -- Chris Hastie