Mike and Erik,

We should be able to use one or more of the suggested solutions.

Thanks for the help.

- Keith

-----Original Message-----
From: [email protected] 
[mailto:[email protected]] On Behalf Of Erik Hennum
Sent: Thursday, February 19, 2015 7:57 AM
To: MarkLogic Developer Discussion
Subject: Re: [MarkLogic Dev General] Preserving the HTML5 doctype

Hi, Keith:

Per Justin's (and Mike's) good advice, you should be able to do the conversion 
using a REST API transform with the xdmp:quote() function in an XQuery 
transform or the xsl:output statement in an XSLT transform (or the xdmp.quote() 
function in a JavaScript transform in MarkLogic 8).

A possible XQuery transform:

xquery version "1.0-ml";
module namespace html5ifier = 
"https://urldefense.proofpoint.com/v2/url?u=http-3A__marklogic.com_rest-2Dapi_transform_html5ifier&d=AwICAg&c=z0adcvxXWKG6LAMN6dVEqQ&r=wQ09nIebnRJGH1VgSesPfFnvXo10BKdu-taGZQaoghw&m=Qm4QZePXyH5FoyitUL3C7NkxwSHyUo0S3T1lKilXtJs&s=VYKlldymsOFpgcOCWiz0EfvglcOHAmQRWX6rPLOc80M&e=
 ";

declare default function namespace 
"https://urldefense.proofpoint.com/v2/url?u=http-3A__www.w3.org_2005_xpath-2Dfunctions&d=AwICAg&c=z0adcvxXWKG6LAMN6dVEqQ&r=wQ09nIebnRJGH1VgSesPfFnvXo10BKdu-taGZQaoghw&m=Qm4QZePXyH5FoyitUL3C7NkxwSHyUo0S3T1lKilXtJs&s=e6MPrrGxzIk9Q_wef4zY6TBh0n8TOP3MFJLy8E7Bh0c&e=
 "; declare option xdmp:mapping "false";

declare function html5ifier:transform(
    $context as map:map,
    $params  as map:map,
    $content as document-node()
) as document-node()
{
    map:put($context,"output-type","text/html"),

    document{text{
        xdmp:quote($content,
            <options xmlns="xdmp:quote">
                <method>html</method>
                <media-type>text/html</media-type>
                <doctype-public>html</doctype-public>
            </options>)
        }}
};

You would install the transform by PUT to:

    
https://urldefense.proofpoint.com/v2/url?u=http-3A__localhost-3A8011_v1_config_transforms_html5ifier&d=AwICAg&c=z0adcvxXWKG6LAMN6dVEqQ&r=wQ09nIebnRJGH1VgSesPfFnvXo10BKdu-taGZQaoghw&m=Qm4QZePXyH5FoyitUL3C7NkxwSHyUo0S3T1lKilXtJs&s=uBOGxULY0VaXeIq68mq7UWicGnnuqW7YAtJglW1Hazs&e=
 

Then, you would GET the persisted XHTML document using the transform

    
https://urldefense.proofpoint.com/v2/url?u=http-3A__localhost-3A8011_v1_documents-3Furi-3D_path_to_the_doc.xhtml-26transform-3Dhtml5ifier&d=AwICAg&c=z0adcvxXWKG6LAMN6dVEqQ&r=wQ09nIebnRJGH1VgSesPfFnvXo10BKdu-taGZQaoghw&m=Qm4QZePXyH5FoyitUL3C7NkxwSHyUo0S3T1lKilXtJs&s=wN986U6e0j4IgJDiR9ZrmpaVY5xUbZkJsETRfo1yy5I&e=
 

You might find that you need to make additional changes to the XHTML document 
within the transform (either on the XML before quoting or on the string after 
quoting), but that should get you closer.


Hoping that helps,


Erik Hennum

________________________________________
From: [email protected] 
[[email protected]] on behalf of Michael Blakeley 
[[email protected]]
Sent: Wednesday, February 18, 2015 2:43 PM
To: MarkLogic Developer Discussion
Subject: Re: [MarkLogic Dev General] Preserving the HTML5 doctype

MarkLogic doesn't store doctypes with XML documents. As I understand it this is 
mostly because they don't exist in the XQuery data model: 
https://urldefense.proofpoint.com/v2/url?u=http-3A__www.w3.org_TR_xpath-2Ddatamodel_&d=AwICAg&c=z0adcvxXWKG6LAMN6dVEqQ&r=wQ09nIebnRJGH1VgSesPfFnvXo10BKdu-taGZQaoghw&m=Qm4QZePXyH5FoyitUL3C7NkxwSHyUo0S3T1lKilXtJs&s=G1ZDoPdzgno6XvHY6BPZ0ienmmXBAJa1teYaqSGZjfc&e=
 

Have you looked at Justin's post 
https://urldefense.proofpoint.com/v2/url?u=http-3A__markmail.org_message_qmsos7np64ohyctp&d=AwICAg&c=z0adcvxXWKG6LAMN6dVEqQ&r=wQ09nIebnRJGH1VgSesPfFnvXo10BKdu-taGZQaoghw&m=Qm4QZePXyH5FoyitUL3C7NkxwSHyUo0S3T1lKilXtJs&s=7I8hzxL1aOqEiQg4q5YxMaF7hI5HQw5L9F5wh3CdaQs&e=
  already? That approach presumes you can run everything through an output 
module in XQuery or XSLT (or maybe even JavaScript now?).

The app-server output options might help too, but I'm not sure if those can 
handle an HTML5 doctype.

-- Mike

> On 18 Feb 2015, at 12:43 , Keith Breinholt <[email protected]> wrote:
>
> We have a requirement to store and retrieve well formed HTML5 documents in 
> MarkLogic using Java API or REST API.
>
> Each document has an '.html' extension and the standard HTML5 doctype 
> <!DOCTYPE html>.  When documents are inserted, by default they get stored as 
> text documents.
>
> We would like to use all the goodness that MarkLogic provides for search and 
> manipulation of the documents as if they were XHTML, but we need to preserve 
> the HTML5 doctype and .html extension for compatibility with other tools.  I 
> am sure we are not the only ones to have encountered this scenario.
>
> We have tried changing the html mimetype to xml but when documents are 
> inserted the doctype gets replaced with the XML doctype.  Is there a way to 
> insert and retrieve well formed HTML5 documents without losing the doctype?
>
> Keith Breinholt
> "If you cannot describe what you are doing as a process, you don't 
> know what you are doing." - W. Edwards Deming
>
>
> NOTICE: This email message is for the sole use of the intended recipient(s) 
> and may contain confidential and privileged information. Any unauthorized 
> review, use, disclosure or distribution is prohibited. If you are not the 
> intended recipient, please contact the sender by reply email and destroy all 
> copies of the original message.
_______________________________________________
General mailing list
[email protected]
https://urldefense.proofpoint.com/v2/url?u=http-3A__developer.marklogic.com_mailman_listinfo_general&d=AwICAg&c=z0adcvxXWKG6LAMN6dVEqQ&r=wQ09nIebnRJGH1VgSesPfFnvXo10BKdu-taGZQaoghw&m=Qm4QZePXyH5FoyitUL3C7NkxwSHyUo0S3T1lKilXtJs&s=AMFXxpRrpUtyoq7bofZEguVw3tD8Ptbqn9JLN6Fsiik&e=
 


 NOTICE: This email message is for the sole use of the intended recipient(s) 
and may contain confidential and privileged information. Any unauthorized 
review, use, disclosure or distribution is prohibited. If you are not the 
intended recipient, please contact the sender by reply email and destroy all 
copies of the original message.

_______________________________________________
General mailing list
[email protected]
http://developer.marklogic.com/mailman/listinfo/general

Reply via email to