Mike and Erik, We should be able to use one or more of the suggested solutions.
Thanks for the help. - Keith -----Original Message----- From: [email protected] [mailto:[email protected]] On Behalf Of Erik Hennum Sent: Thursday, February 19, 2015 7:57 AM To: MarkLogic Developer Discussion Subject: Re: [MarkLogic Dev General] Preserving the HTML5 doctype Hi, Keith: Per Justin's (and Mike's) good advice, you should be able to do the conversion using a REST API transform with the xdmp:quote() function in an XQuery transform or the xsl:output statement in an XSLT transform (or the xdmp.quote() function in a JavaScript transform in MarkLogic 8). A possible XQuery transform: xquery version "1.0-ml"; module namespace html5ifier = "https://urldefense.proofpoint.com/v2/url?u=http-3A__marklogic.com_rest-2Dapi_transform_html5ifier&d=AwICAg&c=z0adcvxXWKG6LAMN6dVEqQ&r=wQ09nIebnRJGH1VgSesPfFnvXo10BKdu-taGZQaoghw&m=Qm4QZePXyH5FoyitUL3C7NkxwSHyUo0S3T1lKilXtJs&s=VYKlldymsOFpgcOCWiz0EfvglcOHAmQRWX6rPLOc80M&e= "; declare default function namespace "https://urldefense.proofpoint.com/v2/url?u=http-3A__www.w3.org_2005_xpath-2Dfunctions&d=AwICAg&c=z0adcvxXWKG6LAMN6dVEqQ&r=wQ09nIebnRJGH1VgSesPfFnvXo10BKdu-taGZQaoghw&m=Qm4QZePXyH5FoyitUL3C7NkxwSHyUo0S3T1lKilXtJs&s=e6MPrrGxzIk9Q_wef4zY6TBh0n8TOP3MFJLy8E7Bh0c&e= "; declare option xdmp:mapping "false"; declare function html5ifier:transform( $context as map:map, $params as map:map, $content as document-node() ) as document-node() { map:put($context,"output-type","text/html"), document{text{ xdmp:quote($content, <options xmlns="xdmp:quote"> <method>html</method> <media-type>text/html</media-type> <doctype-public>html</doctype-public> </options>) }} }; You would install the transform by PUT to: https://urldefense.proofpoint.com/v2/url?u=http-3A__localhost-3A8011_v1_config_transforms_html5ifier&d=AwICAg&c=z0adcvxXWKG6LAMN6dVEqQ&r=wQ09nIebnRJGH1VgSesPfFnvXo10BKdu-taGZQaoghw&m=Qm4QZePXyH5FoyitUL3C7NkxwSHyUo0S3T1lKilXtJs&s=uBOGxULY0VaXeIq68mq7UWicGnnuqW7YAtJglW1Hazs&e= Then, you would GET the persisted XHTML document using the transform https://urldefense.proofpoint.com/v2/url?u=http-3A__localhost-3A8011_v1_documents-3Furi-3D_path_to_the_doc.xhtml-26transform-3Dhtml5ifier&d=AwICAg&c=z0adcvxXWKG6LAMN6dVEqQ&r=wQ09nIebnRJGH1VgSesPfFnvXo10BKdu-taGZQaoghw&m=Qm4QZePXyH5FoyitUL3C7NkxwSHyUo0S3T1lKilXtJs&s=wN986U6e0j4IgJDiR9ZrmpaVY5xUbZkJsETRfo1yy5I&e= You might find that you need to make additional changes to the XHTML document within the transform (either on the XML before quoting or on the string after quoting), but that should get you closer. Hoping that helps, Erik Hennum ________________________________________ From: [email protected] [[email protected]] on behalf of Michael Blakeley [[email protected]] Sent: Wednesday, February 18, 2015 2:43 PM To: MarkLogic Developer Discussion Subject: Re: [MarkLogic Dev General] Preserving the HTML5 doctype MarkLogic doesn't store doctypes with XML documents. As I understand it this is mostly because they don't exist in the XQuery data model: https://urldefense.proofpoint.com/v2/url?u=http-3A__www.w3.org_TR_xpath-2Ddatamodel_&d=AwICAg&c=z0adcvxXWKG6LAMN6dVEqQ&r=wQ09nIebnRJGH1VgSesPfFnvXo10BKdu-taGZQaoghw&m=Qm4QZePXyH5FoyitUL3C7NkxwSHyUo0S3T1lKilXtJs&s=G1ZDoPdzgno6XvHY6BPZ0ienmmXBAJa1teYaqSGZjfc&e= Have you looked at Justin's post https://urldefense.proofpoint.com/v2/url?u=http-3A__markmail.org_message_qmsos7np64ohyctp&d=AwICAg&c=z0adcvxXWKG6LAMN6dVEqQ&r=wQ09nIebnRJGH1VgSesPfFnvXo10BKdu-taGZQaoghw&m=Qm4QZePXyH5FoyitUL3C7NkxwSHyUo0S3T1lKilXtJs&s=7I8hzxL1aOqEiQg4q5YxMaF7hI5HQw5L9F5wh3CdaQs&e= already? That approach presumes you can run everything through an output module in XQuery or XSLT (or maybe even JavaScript now?). The app-server output options might help too, but I'm not sure if those can handle an HTML5 doctype. -- Mike > On 18 Feb 2015, at 12:43 , Keith Breinholt <[email protected]> wrote: > > We have a requirement to store and retrieve well formed HTML5 documents in > MarkLogic using Java API or REST API. > > Each document has an '.html' extension and the standard HTML5 doctype > <!DOCTYPE html>. When documents are inserted, by default they get stored as > text documents. > > We would like to use all the goodness that MarkLogic provides for search and > manipulation of the documents as if they were XHTML, but we need to preserve > the HTML5 doctype and .html extension for compatibility with other tools. I > am sure we are not the only ones to have encountered this scenario. > > We have tried changing the html mimetype to xml but when documents are > inserted the doctype gets replaced with the XML doctype. Is there a way to > insert and retrieve well formed HTML5 documents without losing the doctype? > > Keith Breinholt > "If you cannot describe what you are doing as a process, you don't > know what you are doing." - W. Edwards Deming > > > NOTICE: This email message is for the sole use of the intended recipient(s) > and may contain confidential and privileged information. Any unauthorized > review, use, disclosure or distribution is prohibited. If you are not the > intended recipient, please contact the sender by reply email and destroy all > copies of the original message. _______________________________________________ General mailing list [email protected] https://urldefense.proofpoint.com/v2/url?u=http-3A__developer.marklogic.com_mailman_listinfo_general&d=AwICAg&c=z0adcvxXWKG6LAMN6dVEqQ&r=wQ09nIebnRJGH1VgSesPfFnvXo10BKdu-taGZQaoghw&m=Qm4QZePXyH5FoyitUL3C7NkxwSHyUo0S3T1lKilXtJs&s=AMFXxpRrpUtyoq7bofZEguVw3tD8Ptbqn9JLN6Fsiik&e= NOTICE: This email message is for the sole use of the intended recipient(s) and may contain confidential and privileged information. Any unauthorized review, use, disclosure or distribution is prohibited. If you are not the intended recipient, please contact the sender by reply email and destroy all copies of the original message. _______________________________________________ General mailing list [email protected] http://developer.marklogic.com/mailman/listinfo/general
