I am a bit of a newb with ML, so please forgive me if I missed this
solution while searching the archives.

Is it possible to use regular expressions with the mlcp aggregate import
process to break apart XML wherein the root element for each destination
document has an attribute?

The documentation states that if the data looks like the following example:

<?xml version="1.0" encoding="UTF-8"?>
<people xmlns="http://marklogic.com/examples";>...</people>

MLCP will not ingest documents unless you set the
-aggregate_record_namespace to "http://marklogic.com/examples";.

However, my data looks like this:

<?xml version="1.0" encoding="UTF-8" standalone="yes"?>
<people id="186c370b-97c2-429f-b4c0-33d99248c16a">...</people>
<people id="e8f84d79-dcb5-4003-baf7-1fdf88063100"> ... </people>
<people id="93ffb99d-9f56-4551-bf3b-bba439eb1c7b">...</people>
<people id="4f8d9702-4eae-4498-b466-c4423fef2933">...</people>

I'd like to use a regular expression like
 'id="[0-9a-f]*-[0-9a-f]*-[0-9a-f]*-[0-9a-f]*-[0-9a-f]*"' as the value for
-aggregate_record_namespace to break up the aggregate during ingestion into
the document chunks we'd like to have within ML.

Is this possible or just a pipe dream?

Thanks in advance,
Lucas
_______________________________________________
General mailing list
[email protected]
Manage your subscription at: 
http://developer.marklogic.com/mailman/listinfo/general

Reply via email to