Hi Roland,

I¹ve not tested your sample with MLCP, but I did want to quickly raise two
items:

1) I can see now that using the Title element as the aggregrate_uri_id may
be a problem.  There appear to be different concert events - based on date
and city and even bands - the titles of which share the same name.  One of
the ampersand titles, for example, falls into this category:  American
Cajun, Blues & Zydeco Festival.  There was an event of that Title in
Walsrode, Leverkusen, Mainz, and a number of other cities all with
different dates.  Using Title as the identifier, only one of those concert
events will be stored after MLCP completes the load job.
2) Have you considered allowing mlcp to assign a document URI?  You can
achieve this by omitting -aggregate_uri_id.

HTH,
Kevin

 

On 9/23/15, 8:51 AM, "[email protected] on behalf of
Roland Wenzke" <[email protected] on behalf of
[email protected]> wrote:

>Folks,
>I am trying to import a xml document into ml, where the content is an
>aggregate like the attached festivals_1.xml.
>My import command is:
>mlcp.sh import -host 192.168.1.218 -port 8009 -username cdb -password cdb
>-
>mode local -document_type xml -input_file_path
>/home/rwenzke/Dokumente/Orgkram/Jobs/Marklogic/data/Festivals_2.xml -
>input_file_type aggregates -aggregate_record_element Festival
>-aggregate_uri_id 
>Title -output_uri_prefix /festivals/Title -output_uri_suffix .xml -
>output_collections concerts -xml_repair_level full
>
>I get an error message(multiple of them) like:
>15/09/23 13:23:40 INFO contentpump.ContentPump: Hadoop library version:
>2.6.0
>15/09/23 13:23:40 INFO contentpump.LocalJobRunner: Content type: XML
>15/09/23 13:23:41 INFO input.FileInputFormat: Total input paths to
>process : 1
>15/09/23 13:23:41 ERROR contentpump.AggregateXMLReader: Parsing error
>javax.xml.stream.XMLStreamException: badly formed xml: no END_TAG after
>id 
>textLine number = 2048
>Column number = 29
>System Id = null
>Public Id = null
>Location Uri= null
>CharacterOffset = 137403
>
>        at 
>com.marklogic.contentpump.AggregateXMLReader.processStartElement(Aggregate
>XMLReader.java:326)
>        at 
>com.marklogic.contentpump.AggregateXMLReader.nextKeyValue(AggregateXMLRead
>er.java:460)
>        at 
>com.marklogic.contentpump.LocalJobRunner$TrackingRecordReader.nextKeyValue
>(LocalJobRunner.java:435)
>        at 
>org.apache.hadoop.mapreduce.task.MapContextImpl.nextKeyValue(MapContextImp
>l.java:80)
>        at 
>org.apache.hadoop.mapreduce.lib.map.WrappedMapper$Context.nextKeyValue(Wra
>ppedMapper.java:91)
>        at 
>com.marklogic.contentpump.MultithreadedMapper$SubMapRecordReader.nextKeyVa
>lue(MultithreadedMapper.java:275)
>        at 
>org.apache.hadoop.mapreduce.task.MapContextImpl.nextKeyValue(MapContextImp
>l.java:80)
>        at 
>org.apache.hadoop.mapreduce.lib.map.WrappedMapper$Context.nextKeyValue(Wra
>ppedMapper.java:91)
>        at 
>com.marklogic.contentpump.BaseMapper.runThreadSafe(BaseMapper.java:45)
>        at 
>com.marklogic.contentpump.MultithreadedMapper$MapRunner.run(MultithreadedM
>apper.java:376)
>        at 
>com.marklogic.contentpump.MultithreadedMapper.run(MultithreadedMapper.java
>:215)
>        at 
>com.marklogic.contentpump.LocalJobRunner$LocalMapTask.call(LocalJobRunner.
>java:376)
>        at java.util.concurrent.FutureTask.run(Unknown Source)
>        at java.util.concurrent.ThreadPoolExecutor.runWorker(Unknown
>Source)
>        at java.util.concurrent.ThreadPoolExecutor$Worker.run(Unknown
>Source)
>        at java.lang.Thread.run(Unknown Source)
>15/09/23 13:23:41 ERROR contentpump.MultithreadedMapper: Parsing error
>java.io.IOException: Parsing error
>        at 
>com.marklogic.contentpump.AggregateXMLReader.nextKeyValue(AggregateXMLRead
>er.java:536)
>        at 
>com.marklogic.contentpump.LocalJobRunner$TrackingRecordReader.nextKeyValue
>(LocalJobRunner.java:435)
>        at 
>org.apache.hadoop.mapreduce.task.MapContextImpl.nextKeyValue(MapContextImp
>l.java:80)
>        at 
>org.apache.hadoop.mapreduce.lib.map.WrappedMapper$Context.nextKeyValue(Wra
>ppedMapper.java:91)
>        at 
>com.marklogic.contentpump.MultithreadedMapper$SubMapRecordReader.nextKeyVa
>lue(MultithreadedMapper.java:275)
>        at 
>org.apache.hadoop.mapreduce.task.MapContextImpl.nextKeyValue(MapContextImp
>l.java:80)
>        at 
>org.apache.hadoop.mapreduce.lib.map.WrappedMapper$Context.nextKeyValue(Wra
>ppedMapper.java:91)
>        at 
>com.marklogic.contentpump.BaseMapper.runThreadSafe(BaseMapper.java:45)
>        at 
>com.marklogic.contentpump.MultithreadedMapper$MapRunner.run(MultithreadedM
>apper.java:376)
>        at 
>com.marklogic.contentpump.MultithreadedMapper.run(MultithreadedMapper.java
>:215)
>        at 
>com.marklogic.contentpump.LocalJobRunner$LocalMapTask.call(LocalJobRunner.
>java:376)
>        at java.util.concurrent.FutureTask.run(Unknown Source)
>        at java.util.concurrent.ThreadPoolExecutor.runWorker(Unknown
>Source)
>        at java.util.concurrent.ThreadPoolExecutor$Worker.run(Unknown
>Source)
>        at java.lang.Thread.run(Unknown Source)
>Caused by: javax.xml.stream.XMLStreamException: badly formed xml: no
>END_TAG 
>after id textLine number = 2048
>Column number = 29
>System Id = null
>Public Id = null
>Location Uri= null
>CharacterOffset = 137403
>
>Looking at the referenced positions seems to indicate the issue is caused
>by 
>the encoded ampersand:
>
>       <Festival>
>               <Land>de</Land>
>               <Ort>46047 Oberhausen</Ort>
>               <Start_date>24.10.2015</Start_date>
>               <End_date />
>               <Title>American Cajun, Blues &amp; Zydeco Festival</Title>
>               <ZIP>46047</ZIP>
>               <City>Oberhausen</City>
>               <Artists>
>                       <Artist>Die Apokalyptischen Reiter; Korpiklaani; Varg;
>Skyforger; Winterstorm; Finntroll; Hämatom</Artist>
>               </Artists>
>               <Link>http://www.heidenfest.eu/</Link>
>               <Description>Die Götter des Nordens erwachen in diesem Herbst
>abermals zu neuem Leben und senden ihre tapfersten Heidenkrieger zurück
>an die 
>Fronten, um einen weiteren bombastischen Streifzug durch Europa
>vorzubereiten. 
>Bereits in diesem Moment rüstet sich eine Meute ungläubiger Barbaren,
>ehrenvoller Schwertkämpfer und unerschrockener Druiden für ein
>Schlachtfest 
>der ganz besonderen Art, um die faszinierende Nord-Mythologie längst
>vergangener Tage anno 2010 fortleben zu lassen. Die Zeichen stehen auf
>Sturm, 
>denn aus allen dunkel-trüben Winkeln hallen schon die Hörner, die Macht
>des 
>bevorstehenden Ragnaröks verkündend. Heidenfest erfüllt auch in diesem
>Jahr 
>die Wünsche der Pagan-Metal-Genießer zu 100 % und lädt ein zu Klang,
>Feier und 
>Umtrunk mit den zurzeit führendsten Szene-Riesen. Denn im Zuge dieser
>Festival-Tour wird das Spektrum der heidnischen Tondichtung vollends
>ausgelotet: Ob schwarzmetallische Epik, dramatische Melodien, zügellose
>Wut 
>oder folkloristische Ohrwürmer, Heidenfest füttert euch mit den
>essentiellen 
>Kraftstoffen der Pagan-Kultur.</Description>
>               <Styles>
>                       <Style>Metal; Black Metal; Folk Metal; Power Metal; 
> Pagan
>Metal; Epic Metal</Style>
>               </Styles>
>       </Festival>
>
>I have created a reduced list of festivals whithout any ampersands in the
>xml 
>as an example, and that loads perfectly fine into a set of plit
>documents. Any 
>help on what's wrong here?
>I am using MarkLogic 8.03 on Win 7 as BAckend, mlcp is version 1.3.3 on
>Linux 
>using Oracle Java 1.8_25 jvm
>
>-------------------------------------------------------------

_______________________________________________
General mailing list
[email protected]
Manage your subscription at: 
http://developer.marklogic.com/mailman/listinfo/general

Reply via email to