Jakob, Sorry about the unresponsiveness. That looks like you might have uncovered a bug. I’ll have an Engineer on our end try to reproduce this and contact you offline for more details. I’ll follow up with the list with the eventual outcome. Thanks for your feedback.
Justin Justin Makeig Director, Product Management MarkLogic Corporation [email protected]<mailto:[email protected]> www.marklogic.com<http://www.marklogic.com/> On Nov 6, 2013, at 7:12 AM, Jakob Fix <[email protected]<mailto:[email protected]>> wrote: Hello, I was just wondering whether the silence in response to my question is "stunned", "not interested", "well, can't he figure this out by himself" or "oops, didn't see this one"? :-) cheers, Jakob. On Fri, Nov 1, 2013 at 12:01 AM, Jakob Fix <[email protected]<mailto:[email protected]>> wrote: Hi, we've run into something we think might be a bug with the most recent version of mlcp. We did an export of a database with XML documents and lots of binary documents, and an import of the exported data into another database. In the second step of the procedure, i.e. the import into the new database, the error below appeared (the line with Archive damaged ...). Apparently, mlcp stores XML and binary documents in different zip files. Also, each binary document gets its metadata document. In our case, the export created two zip files containing the binaries. For some reason, in the case of one document, the actual binary file and its metadata file were separated, as shown below: 20131031140432+0100-000001-BINARY.zip ==> RO-GE_DTC.pdf.metadata 20131031140432+0100-000002-BINARY.zip ==> RO-GE_DTC.pdf which seems to have caused the error below. The PDF file is indeed not loaded into the database. Reuniting the PDF file with its metadata equivalent in the same binary zip file made the import procedure run without errors. thanks, Jakob. marklogic-contentpump-1.0.3\bin\mlcp.bat EXPORT -host 192.168.56.90 -port 50000 -username abc -password abc -output_type archive -output_file_path db-prod-20131031 marklogic-contentpump-1.0.3\bin\mlcp.bat IMPORT -host 192.168.56.90 -port 40100 -username abc -password abc -input_file_path db-prod-20131031 -input_file_type archive 13/10/31 14:09:54 INFO contentpump.LocalJobRunner: Content type: XML 13/10/31 14:09:54 INFO jvm.JvmMetrics: Initializing JVM Metrics with processName=JobTracker, sessionId= 13/10/31 14:09:54 INFO input.FileInputFormat: Total input paths to process : 3 13/10/31 14:09:55 ERROR contentpump.ArchiveRecordReader: Archive damaged: no/incorrect metadata for /content/assets/agreements/RO-GE_DTC.pdf in /D:/Projects/EOI/deployment/mlcp/eoi-db-prod-20131032/20131031140432+0100-000002-BINARY.zip 13/10/31 14:09:55 ERROR contentpump.LocalJobRunner: Error running task: attempt__0000_m_000001_0 java.lang.NullPointerException at com.marklogic.contentpump.DatabaseContentWriter.write(DatabaseContentWriter.java:231) at com.marklogic.contentpump.DatabaseContentWriter.write(DatabaseContentWriter.java:58) at org.apache.hadoop.mapreduce.TaskInputOutputContext.write(TaskInputOutputContext.java:80) at com.marklogic.contentpump.DocumentMapper.map(DocumentMapper.java:46) at com.marklogic.contentpump.DocumentMapper.map(DocumentMapper.java:32) at org.apache.hadoop.mapreduce.Mapper.run(Mapper.java:144) at com.marklogic.contentpump.LocalJobRunner$LocalMapTask.call(LocalJobRunner.java:375) at java.util.concurrent.FutureTask$Sync.innerRun(Unknown Source) at java.util.concurrent.FutureTask.run(Unknown Source) at java.util.concurrent.ThreadPoolExecutor.runWorker(Unknown Source) at java.util.concurrent.ThreadPoolExecutor$Worker.run(Unknown Source) at java.lang.Thread.run(Unknown Source) 13/10/31 14:09:56 INFO contentpump.LocalJobRunner: completed 0% 13/10/31 14:14:42 INFO contentpump.LocalJobRunner: completed 33% 13/10/31 14:15:41 WARN contentpump.DatabaseContentWriter: SEC-PERMDENIED: Permission denied 13/10/31 14:18:27 INFO contentpump.LocalJobRunner: completed 66% 13/10/31 14:18:27 INFO contentpump.LocalJobRunner: com.marklogic.contentpump.ContentPumpStats: 13/10/31 14:18:27 INFO contentpump.LocalJobRunner: ATTEMPTED_INPUT_RECORD_COUNT: 20230 13/10/31 14:18:27 INFO contentpump.LocalJobRunner: SKIPPED_INPUT_RECORD_COUNT: 0 13/10/31 14:18:27 INFO contentpump.LocalJobRunner: Total execution time: 512 sec _______________________________________________ General mailing list [email protected]<mailto:[email protected]> http://developer.marklogic.com/mailman/listinfo/general
_______________________________________________ General mailing list [email protected] http://developer.marklogic.com/mailman/listinfo/general
