Hi, Ryan: FWIW, I had success with a proof of concept that did the conversion outside the database, storing the original wikitext as text for edit operations and storing the converted HTML as XHTML for query operations (doing the conversions with Mylyn wikitext [1] in Java).
An XQuery wiki parser would of course be better but, as David says, would not be easy. I just noticed an upcoming paper explores the issues for MediaWiki [2] -- haven't read it yet. For the undaunted, a place to start might be WikiCreole [3], which tries to define a fallback wikitext for the subset of constructs common to most wikis. Anyway, storing XHTML only works if the wikitext converter generates POSH [4] instead of painted HTML. The XHTML approach might be improved by toggling the microformats -- in essence, replacing the overloaded HTML element with a non-HTML element that has the same name as the class attribute value and stuffing the original HTML element name in a non-HTML attribute or container element. That way, you could index with sensitivity microformat but revert back to plain old XHTML on the fly. (Of course, that forbids unions -- multiple microformats on a single HTML element.) Erik Hennum [1] http://wiki.eclipse.org/Mylyn/Incubator/WikiText [2] http://dirkriehle.com/2011/07/29/design-and-implementation-of-the-sweble-wikitext-parser/ [3] http://wiki.wikicreole.org/ [4] http://microformats.org/wiki/posh Sent: Sunday, September 11, 2011 8:41 AM To: General MarkLogic Developer Discussion Subject: Re: [MarkLogic Dev General] Wiki markup parser in XQuery? I would absolutely LOVE such a thing. Problems abound though, its not trivial. First off there are many wiki variants. Not all the same. Second is a subtle hand-off to HTML. Wiki parsers convert to HTML and rely on HTML to format the text. If you want to pull *structure* from that it can be tricky. Line breaks, blank spaces etc really confuse the work. Sent: Saturday, September 10, 2011 9:57 PM To: general@developer.marklogic.com Subject: [MarkLogic Dev General] Wiki markup parser in XQuery? Anyone know of a Wiki markup parser written in XQuery? Haven't found one from my searching. _______________________________________________ General mailing list General@developer.marklogic.com http://developer.marklogic.com/mailman/listinfo/general