Re: [MarkLogic Dev General] URI_ID whitespace problems with mlcp

Florent Georges Wed, 22 Mar 2017 07:10:46 -0700

Hi,

That is indeed the most likely explanation.  Just to make it clear to the
OP, in such a situation an XML parser MUST stop normal processing (see e.g.
http://w3.org/TR/xml/#sec-terminology, and the fact that having "<a b>"
where a start tag is possible is ultimately breaking the document
production rule).


When it comes to XML (in general, not only with MarkLogic), sometimes
working around validity might the right solution, depending on the
technical and non-technical context.  But having ill-formed documents never
is.  Fixing ill-formedness is always less painful than any other solution.

Just my 2 cents.  Regards,

-- 
Florent Georges
H2O Consulting
http://h2o.consulting/


On 22 March 2017 at 14:14, Martijn Sintemaartensdijk wrote:

> Dear Lucas,
>
> judging from your command, I think your input file contains an
> XML-starttag "<uri _id>" and corresponding endtag "</uri _id>".
> Unfortunately, XML tag names may not contain empty spaces (See also:
> https://www.w3.org/TR/2008/REC-xml-20081126/#NT-Name).
>
> MLCP tries to interpret the xml-file and it reports an unexpected
> character, ">". MLCP assumes "_id" to be an attribute name to the tag name
> "uri", like <uri _id="1234">. The next character following "_id" is
> therefore expected to be an equal sign.
>
> I would advice you to request the output file be offered in accordance
> with the XML-specification, rather than trying to fix the document.
> Otherwise, I fear, you will be forced to use sed, or a something similar,
> to replace the malformed XML-tags through the entire document each and
> every time you receive a new version.
>
> Met vriendelijke groet / Kind regards,
>
>
>
> *Martijn Sintemaartensdijk*
>
>
>
>
>
>
> *A:* Einsteinbaan 12, 3439 NJ Nieuwegein
>
> *T:* (+31) 06 40 59 09 36
>
> *E:* martijn.sintemaartensd...@dikw.com
>
> *W:* www.dikw.nl
>
>
>
> Hartelijk dank voor uw waardering en stem!
> <http://www.dikw.com/algemeen-nieuws/computable-awards-2016/>
>
>
> [image: banner 468x60 DIKW prijswinnaar]
> <http://www.dikw.com/algemeen-nieuws/computable-awards-2016/>
>
>
> On 21 March 2017 at 19:02, Lucas Davenport <nonameacco...@gmail.com>
> wrote:
>
>> I am a newb, so forgive me if I missed this answer while searching.
>>
>> I am testing ML 8 for a project at work and we have a requirement to load
>> large amounts of historical data. I've read the mlcp documentation and can
>> successfully import some test data, but the problem I am facing is the
>> archive data has a space in the record identifier.
>>
>> My command is:
>>  mlcp.sh import -host localhost -port 8006 -username dataload -password
>> dataload -mode local -input_file_path ../xml/MD2014aggregate.xml
>> -input_file_type aggregates -aggregate_record_element row -uri_id "row _id"
>> -output_uri_prefix /traffic/MD -output_uri_suffix .xml -output_collections
>> published
>>
>> This produces the following error:
>> 17/03/21 13:49:20 ERROR contentpump.ContentPump: Unrecognized argument:
>> \_id
>>
>> I've escaped both the space and the underscore (row\ _id and row\ \_id)
>> and still get the same error. I've also wrapped in in single quotes and
>> double quotes.
>>
>> I'm trying to keep from having to use sed to remove the space between row
>> and _id in the entire file.
>>
>> Is there a way to make mlcp see the URI_ID literally as "row _id"?
>>
>> Thanks in advance.
>>
>> _______________________________________________
>> General mailing list
>> General@developer.marklogic.com
>> Manage your subscription at:
>> http://developer.marklogic.com/mailman/listinfo/general
>>
>>
>
> _______________________________________________
> General mailing list
> General@developer.marklogic.com
> Manage your subscription at:
> http://developer.marklogic.com/mailman/listinfo/general
>
>

_______________________________________________
General mailing list
General@developer.marklogic.com
Manage your subscription at: 
http://developer.marklogic.com/mailman/listinfo/general

Re: [MarkLogic Dev General] URI_ID whitespace problems with mlcp

Reply via email to