Hi Dan,

I think that if you search the docs you will see MarkLogic is quite clear
about what is supported and what not (you just need to find the right
sections ;-).

I disagree that the fact that MarkLogic accepts a DOCTYPE should mean it
should also use it to validate at read (however simple it may seem, which
it isn't). Most systems I worked with during my roughly 15 year of XML
experience didn't do so, nor is *any* XML parser required or even supposed
to do so.

But I do agree that at least reading entities from external dtds (as well
as handling the encoding of the XML decl correctly) like most ordinary XML
parsers do would have been very convenient in uploading DTD-style and
non-Unicode XML which I have faced myself a lot during those years..

Kind regards,
Geert

-----Oorspronkelijk bericht-----
Van: [email protected]
[mailto:[email protected]] Namens [email protected]
Verzonden: dinsdag 3 januari 2012 17:31
Aan: General MarkLogic Developer Discussion
Onderwerp: Re: [MarkLogic Dev General] Invalid entity reference "ndash"

Do they say schema support or XML support? Validation is based upon
reading either the schema or the DTD in any fashion it might be referenced
or constructed. So if they provide any DOCTYPE support which would only be
DTDs, then I would expect it to be very easy to say read and validate as
configured.

This would have complicated my project had it gone through. Would not have
shut it down, but I would have been diappointed in support for something I
would consider key for any documentation project.

..dan


> Hi Dan,
>
> It surprised me a bit too. But not sure the XML rec requires XML parsers
> to support DTD's at all (can't seem to find the relevant section). But
> MarkLogic Server has very good XML Schema support, so I wouldn't say it
> doesn't validate at all. It is just focusing on XML Schema instead of
DTD
> (nor both)..
>
> Kind regards,
> Geert
>
> -----Oorspronkelijk bericht-----
> Van: [email protected]
> [mailto:[email protected]] Namens [email protected]
> Verzonden: dinsdag 3 januari 2012 16:53
> Aan: General MarkLogic Developer Discussion
> Onderwerp: Re: [MarkLogic Dev General] Invalid entity reference "ndash"
>
> That is an interesting limitation I was not aware of. Works with XML
> documents but does not provide full validation capabilites - or the
> ability to work with valid documents as is.
>
> I got some intial training and intoroduction to Marklogic but then never
> got the project to actually implement anything.
>
> ..dan
>
>
>> Hi John,
>>
>>
>>
>> MarkLogic Server handles DOCTYPE rules only very limited. Only entity
>> declarations in the local subset are parsed and used. References to any
>> external entity or dtd file is ignored. That is why a dtd ref doesn’t
>> work.
>> Ron gave a work-around (I have posted similar code to handle mixed
>> encodings by the way some while ago), but that is pretty expensive if
> you
>> need to load many docs. If you need to load many docs, you might prefer
> to
>> use xmlsh or recordloader or any of the other available tools to insert
>> your data. These have better support for DOCTYPEs..
>>
>>
>>
>> I do recall another workaround, which might be acceptable for you.
There
>> is
>> this repair option that defaults to none. If you change it to full, it
>> should allow most of the iso entities and convert them to the
> appropriate
>> Unicode characters automatically. The full repair might do more than
you
>> need though, in case the xml is not well-formed..
>>
>>
>>
>> Kind regards,
>>
>> Geert
>>
>>
>>
>> *Van:* [email protected] [mailto:
>> [email protected]] *Namens *John Zhong
>> *Verzonden:* vrijdag 30 december 2011 18:17
>> *Aan:* General MarkLogic Developer Discussion
>> *Onderwerp:* Re: [MarkLogic Dev General] Invalid entity reference
> "ndash"
>>
>>
>>
>> Yes, I make sure the dtd and the associated ent files are in the
correct
>> location. And I was saying the xdmp:document-get function does not work
> in
>> this case.
>>
>> Actually, I did a simple test by defining a simple xml and dtd:
>>
>> test.dtd:
>>
>> <!ELEMENT test (#PCDATA)>
>> <!ENTITY ndash "&#x02013;">
>>
>> test.xml:
>>
>> <?xml version="1.0" encoding="utf-8"?>
>> <!DOCTYPE test SYSTEM "test.dtd">
>> <test>&ndash;</test>
>>
>> I can open the xml by ie without problem (it shows error if I delete
the
>> entity definition in test.dtd), then tested xdmp:document-get function
>> again, it shows a error:
>>
>> [1.0-ml] XDMP-DOCENTITYREF: xdmp:document-get("/test.xml") -- Invalid
>> entity reference "ndash" at /test.xml line 3
>>
>> John
>>
>> On Sat, Dec 31, 2011 at 12:17 AM, Dan Vint <[email protected]> wrote:
>>
>> You need to make sure the external text entity is
>> being read (or that the file being referenced is
>> being found). You may not actually be reading in
>> the DTD file itself. The way it is configured is
>> correct, your code just needs to make sure it is
>> expanding the doctype via the system or public
>> identifier (journalpublishing3.dtd or via XML
>> catalog //NLM//DTD Journal Publishing DTD v3.0
>> 20080202//EN") and associated files before it starts parsing the
> content.
>>
>> ..dan
>>
>>
>>
>> At 07:28 AM 12/30/2011, you wrote:
>>>Thank you for your note, Erik.
>>>
>>>And thank you for your code, Ron. Yes, I did try that and it worked.
>>>
>>>The xml file has already had a dtd declaration
>>><!DOCTYPE article PUBLIC "-//NLM//DTD Journal
>>>Publishing DTD v3.0 20080202//EN"
>>>"journalpublishing3.dtd">, and the dtd file also
>>>reference to a isopub.ent file that defined this
>>>entity.<!ENTITY ndash "&#x02013;" ><!--=en dash -->
>>>
>>>So, it seems I have to make the entity definition appear in the xml:
>>>
>>><!DOCTYPE article PUBLIC "-//NLM//DTD Journal
>>>Publishing DTD v3.0 20080202//EN" "journalpublishing3.dtd"
>>>[<!ENTITY ndash "&#x2013;">]
>>> >
>>>
>>>John
>>>
>>>On Fri, Dec 30, 2011 at 10:42 PM, Ron Hitchens
>>
>>><<mailto:[email protected]>[email protected]> wrote:
>>>
>>>Â  Try this code, adjusting the name of your root
>>>node as needed. Â Other entities could also be defined
>>>in the doc-type header.
>>>
>>>Â  This loads the XML first as text, prepends a doc-type
>>
>>>header that defines the entity and then parses the result
>>
>>>as XML. Â This requires making an extra copy of the document
>>
>>>in memory, so it could bump against memory limits if you
>>>do it in volume with lots of large documents.
>>>
>>
>>>Â  Handy list of entity definitions here:
>>><
>>
>
http://en.wikipedia.org/wiki/List_of_XML_and_HTML_character_entity_referen
> ces
>>>
>>
>
http://en.wikipedia.org/wiki/List_of_XML_and_HTML_character_entity_referen
> ces
>>
>>>
>>>
>>>xquery version '1.0-ml';
>>>
>>>declare variable $file-path := "/tmp/z.xml";
>>>Â  (: path of XML doc in filesystem :)
>>>
>>>declare variable $document-uri :=
>>
>>>"/test/mydoc.xml"; Â  (: URI to insert it as in MarkLogic :)
>>>
>>>declare variable $load-options :=
>>>Â  Â  Â  Â  Â  Â (: force loading as text, not as XML :)
>>>Â  Â <options xmlns="xdmp:document-get">
>>>Â  Â  Â <format>text</format>
>>>Â  Â </options>;
>>
>>>
>>>declare variable $doctype-decl as xs:string :=
>>>Â  (: adjust root node name and add entities as needed :)
>>
>>>Â  '<!DOCTYPE root [<!ENTITY mdash "&#x2014;">]>';
>>
>>>
>>>let $doc-as-text := xdmp:document-get ($file-path, $load-options)
>>>let $doc-with-decl := fn:concat ($doctype-decl, $doc-as-text)
>>>let $doc := xdmp:unquote ($doc-with-decl)
>>>
>>>return xdmp:document-insert ($document-uri, $doc)
>>>
>>>
>>>On Dec 30, 2011, at 6:05 AM, John Zhong wrote:
>>>
>>> > Thanks for your quick answer, Harry.
>>> >
>>> > But how if I don't want to modify the original xml?
>>> >
>>> > Thanks,
>>> > John
>>> >
>>> > On Fri, Dec 30, 2011 at 1:59 PM, Harry B.
>>
>>> <<mailto:[email protected]>[email protected]> wrote:
>>> > Try using the numeric instead
>>> >
>>> > &#8211;
>>> >
>>> > I can't remember why, but this usually works.
>>> >
>>> > On Dec 29, 2011 10:53 PM, "John Zhong"
>>
>>> <<mailto:[email protected]>[email protected]> wrote:
>>> > Hi all,
>>> >
>>> > I am having problem to use the
>>> xdmp:document-get function to read a xml with
>>> entiry reference &ndash; I want to know how to
>>> fix this problem? I am using ML 5.0-1.2 version.
>>> >
>>> > [1.0-ml] XDMP-DOCENTITYREF:
>>> xdmp:document-get("D:\test.xml") -- Invalid
>>> entity reference "ndash" at D:\test.xml line 231
>>> >
>>> > Thank you,
>>> > John
>>> >
>>> >
>>> > _______________________________________________
>>> > General mailing list
>>
>>> >
> <mailto:[email protected]>[email protected]
>>
>>> > http://developer.marklogic.com/mailman/listinfo/general
>>> >
>>> >
>>> > _______________________________________________
>>> > General mailing list
>>
>>> >
> <mailto:[email protected]>[email protected]
>>
>>> > http://developer.marklogic.com/mailman/listinfo/general
>>> >
>>> >
>>> > _______________________________________________
>>> > General mailing list
>>
>>> >
> <mailto:[email protected]>[email protected]
>>> > http://developer.marklogic.com/mailman/listinfo/general
>>>
>>>---
>>>Ron Hitchens {mailto:[email protected]} Â  Ronsoft Technologies
>>>Â  Â  +44 7879 358 212 (voice) Â  Â  Â  Â  Â
>>><http://www.ronsoft.com>http://www.ronsoft.com
>>>Â  Â  +1 707 924 3878 (fax) Â  Â  Â  Â  Â  Â  Â Bit Twiddling At Its
>>> Finest
>>
>>>"No amount of belief establishes any fact." -Unknown
>>>
>>>
>>>
>>>
>>>_______________________________________________
>>>General mailing list
>>
>>><mailto:[email protected]>[email protected]
>>
>>>http://developer.marklogic.com/mailman/listinfo/general
>>>
>>>
>>>_______________________________________________
>>>General mailing list
>>>[email protected]
>>>http://developer.marklogic.com/mailman/listinfo/general
>>
>>
>
--------------------------------------------------------------------------
> -
>> Danny Vint
>>
>> Panoramic Photography
>> http://www.dvint.com
>>
>> voice: 619-938-3610
>>
>> _______________________________________________
>> General mailing list
>> [email protected]
>> http://developer.marklogic.com/mailman/listinfo/general
>> _______________________________________________
>> General mailing list
>> [email protected]
>> http://developer.marklogic.com/mailman/listinfo/general
>>
>
>
> _______________________________________________
> General mailing list
> [email protected]
> http://developer.marklogic.com/mailman/listinfo/general
> _______________________________________________
> General mailing list
> [email protected]
> http://developer.marklogic.com/mailman/listinfo/general
>


_______________________________________________
General mailing list
[email protected]
http://developer.marklogic.com/mailman/listinfo/general
_______________________________________________
General mailing list
[email protected]
http://developer.marklogic.com/mailman/listinfo/general

Reply via email to