Hi Dan,

It surprised me a bit too. But not sure the XML rec requires XML parsers
to support DTD's at all (can't seem to find the relevant section). But
MarkLogic Server has very good XML Schema support, so I wouldn't say it
doesn't validate at all. It is just focusing on XML Schema instead of DTD
(nor both)..

Kind regards,
Geert

-----Oorspronkelijk bericht-----
Van: general-boun...@developer.marklogic.com
[mailto:general-boun...@developer.marklogic.com] Namens dv...@dvint.com
Verzonden: dinsdag 3 januari 2012 16:53
Aan: General MarkLogic Developer Discussion
Onderwerp: Re: [MarkLogic Dev General] Invalid entity reference "ndash"

That is an interesting limitation I was not aware of. Works with XML
documents but does not provide full validation capabilites - or the
ability to work with valid documents as is.

I got some intial training and intoroduction to Marklogic but then never
got the project to actually implement anything.

..dan


> Hi John,
>
>
>
> MarkLogic Server handles DOCTYPE rules only very limited. Only entity
> declarations in the local subset are parsed and used. References to any
> external entity or dtd file is ignored. That is why a dtd ref doesn’t
> work.
> Ron gave a work-around (I have posted similar code to handle mixed
> encodings by the way some while ago), but that is pretty expensive if
you
> need to load many docs. If you need to load many docs, you might prefer
to
> use xmlsh or recordloader or any of the other available tools to insert
> your data. These have better support for DOCTYPEs..
>
>
>
> I do recall another workaround, which might be acceptable for you. There
> is
> this repair option that defaults to none. If you change it to full, it
> should allow most of the iso entities and convert them to the
appropriate
> Unicode characters automatically. The full repair might do more than you
> need though, in case the xml is not well-formed..
>
>
>
> Kind regards,
>
> Geert
>
>
>
> *Van:* general-boun...@developer.marklogic.com [mailto:
> general-boun...@developer.marklogic.com] *Namens *John Zhong
> *Verzonden:* vrijdag 30 december 2011 18:17
> *Aan:* General MarkLogic Developer Discussion
> *Onderwerp:* Re: [MarkLogic Dev General] Invalid entity reference
"ndash"
>
>
>
> Yes, I make sure the dtd and the associated ent files are in the correct
> location. And I was saying the xdmp:document-get function does not work
in
> this case.
>
> Actually, I did a simple test by defining a simple xml and dtd:
>
> test.dtd:
>
> <!ELEMENT test (#PCDATA)>
> <!ENTITY ndash "&#x02013;">
>
> test.xml:
>
> <?xml version="1.0" encoding="utf-8"?>
> <!DOCTYPE test SYSTEM "test.dtd">
> <test>&ndash;</test>
>
> I can open the xml by ie without problem (it shows error if I delete the
> entity definition in test.dtd), then tested xdmp:document-get function
> again, it shows a error:
>
> [1.0-ml] XDMP-DOCENTITYREF: xdmp:document-get("/test.xml") -- Invalid
> entity reference "ndash" at /test.xml line 3
>
> John
>
> On Sat, Dec 31, 2011 at 12:17 AM, Dan Vint <dv...@dvint.com> wrote:
>
> You need to make sure the external text entity is
> being read (or that the file being referenced is
> being found). You may not actually be reading in
> the DTD file itself. The way it is configured is
> correct, your code just needs to make sure it is
> expanding the doctype via the system or public
> identifier (journalpublishing3.dtd or via XML
> catalog //NLM//DTD Journal Publishing DTD v3.0
> 20080202//EN") and associated files before it starts parsing the
content.
>
> ..dan
>
>
>
> At 07:28 AM 12/30/2011, you wrote:
>>Thank you for your note, Erik.
>>
>>And thank you for your code, Ron. Yes, I did try that and it worked.
>>
>>The xml file has already had a dtd declaration
>><!DOCTYPE article PUBLIC "-//NLM//DTD Journal
>>Publishing DTD v3.0 20080202//EN"
>>"journalpublishing3.dtd">, and the dtd file also
>>reference to a isopub.ent file that defined this
>>entity.<!ENTITY ndash "&#x02013;" ><!--=en dash -->
>>
>>So, it seems I have to make the entity definition appear in the xml:
>>
>><!DOCTYPE article PUBLIC "-//NLM//DTD Journal
>>Publishing DTD v3.0 20080202//EN" "journalpublishing3.dtd"
>>[<!ENTITY ndash "&#x2013;">]
>> >
>>
>>John
>>
>>On Fri, Dec 30, 2011 at 10:42 PM, Ron Hitchens
>
>><<mailto:r...@ronsoft.com>r...@ronsoft.com> wrote:
>>
>>Â  Try this code, adjusting the name of your root
>>node as needed. Â Other entities could also be defined
>>in the doc-type header.
>>
>>Â  This loads the XML first as text, prepends a doc-type
>
>>header that defines the entity and then parses the result
>
>>as XML. Â This requires making an extra copy of the document
>
>>in memory, so it could bump against memory limits if you
>>do it in volume with lots of large documents.
>>
>
>>Â  Handy list of entity definitions here:
>><
>
http://en.wikipedia.org/wiki/List_of_XML_and_HTML_character_entity_referen
ces
>>
>
http://en.wikipedia.org/wiki/List_of_XML_and_HTML_character_entity_referen
ces
>
>>
>>
>>xquery version '1.0-ml';
>>
>>declare variable $file-path := "/tmp/z.xml";
>>Â  (: path of XML doc in filesystem :)
>>
>>declare variable $document-uri :=
>
>>"/test/mydoc.xml"; Â  (: URI to insert it as in MarkLogic :)
>>
>>declare variable $load-options :=
>>Â  Â  Â  Â  Â  Â (: force loading as text, not as XML :)
>>Â  Â <options xmlns="xdmp:document-get">
>>Â  Â  Â <format>text</format>
>>Â  Â </options>;
>
>>
>>declare variable $doctype-decl as xs:string :=
>>Â  (: adjust root node name and add entities as needed :)
>
>>Â  '<!DOCTYPE root [<!ENTITY mdash "&#x2014;">]>';
>
>>
>>let $doc-as-text := xdmp:document-get ($file-path, $load-options)
>>let $doc-with-decl := fn:concat ($doctype-decl, $doc-as-text)
>>let $doc := xdmp:unquote ($doc-with-decl)
>>
>>return xdmp:document-insert ($document-uri, $doc)
>>
>>
>>On Dec 30, 2011, at 6:05 AM, John Zhong wrote:
>>
>> > Thanks for your quick answer, Harry.
>> >
>> > But how if I don't want to modify the original xml?
>> >
>> > Thanks,
>> > John
>> >
>> > On Fri, Dec 30, 2011 at 1:59 PM, Harry B.
>
>> <<mailto:dna...@gmail.com>dna...@gmail.com> wrote:
>> > Try using the numeric instead
>> >
>> > &#8211;
>> >
>> > I can't remember why, but this usually works.
>> >
>> > On Dec 29, 2011 10:53 PM, "John Zhong"
>
>> <<mailto:j...@yuxipacific.com>j...@yuxipacific.com> wrote:
>> > Hi all,
>> >
>> > I am having problem to use the
>> xdmp:document-get function to read a xml with
>> entiry reference &ndash; I want to know how to
>> fix this problem? I am using ML 5.0-1.2 version.
>> >
>> > [1.0-ml] XDMP-DOCENTITYREF:
>> xdmp:document-get("D:\test.xml") -- Invalid
>> entity reference "ndash" at D:\test.xml line 231
>> >
>> > Thank you,
>> > John
>> >
>> >
>> > _______________________________________________
>> > General mailing list
>
>> >
<mailto:General@developer.marklogic.com>General@developer.marklogic.com
>
>> > http://developer.marklogic.com/mailman/listinfo/general
>> >
>> >
>> > _______________________________________________
>> > General mailing list
>
>> >
<mailto:General@developer.marklogic.com>General@developer.marklogic.com
>
>> > http://developer.marklogic.com/mailman/listinfo/general
>> >
>> >
>> > _______________________________________________
>> > General mailing list
>
>> >
<mailto:General@developer.marklogic.com>General@developer.marklogic.com
>> > http://developer.marklogic.com/mailman/listinfo/general
>>
>>---
>>Ron Hitchens {mailto:r...@ronsoft.com} Â  Ronsoft Technologies
>>Â  Â  +44 7879 358 212 (voice) Â  Â  Â  Â  Â
>><http://www.ronsoft.com>http://www.ronsoft.com
>>Â  Â  +1 707 924 3878 (fax) Â  Â  Â  Â  Â  Â  Â Bit Twiddling At Its
>> Finest
>
>>"No amount of belief establishes any fact." -Unknown
>>
>>
>>
>>
>>_______________________________________________
>>General mailing list
>
>><mailto:General@developer.marklogic.com>General@developer.marklogic.com
>
>>http://developer.marklogic.com/mailman/listinfo/general
>>
>>
>>_______________________________________________
>>General mailing list
>>General@developer.marklogic.com
>>http://developer.marklogic.com/mailman/listinfo/general
>
>
--------------------------------------------------------------------------
-
> Danny Vint
>
> Panoramic Photography
> http://www.dvint.com
>
> voice: 619-938-3610
>
> _______________________________________________
> General mailing list
> General@developer.marklogic.com
> http://developer.marklogic.com/mailman/listinfo/general
> _______________________________________________
> General mailing list
> General@developer.marklogic.com
> http://developer.marklogic.com/mailman/listinfo/general
>


_______________________________________________
General mailing list
General@developer.marklogic.com
http://developer.marklogic.com/mailman/listinfo/general
_______________________________________________
General mailing list
General@developer.marklogic.com
http://developer.marklogic.com/mailman/listinfo/general

Reply via email to