Hi, just to let you know: The problem that I had was entirely caused by the fact that I was loading files in parallel that depended on each other, by the way of the loader transformation that I've posted. The mlcp percentage display is still confusing, though, as it apparently shows the percentage of the input data that was loaded into the database, not the number of records read from the input. That could be improved, I think but it does not seem to be very important.
Thank you Indy and Geert for looking at this! -Hans On Sun, Jul 3, 2016 at 7:52 PM, Hans Hübner <[email protected]> wrote: > Hi, > > I'm trying to load a bunch of files into MarkLogic using mlcp, but for > some reason, it seems that it skips some of the files. I'm using a command > line like this: > > mlcp.sh import \ > -database tx-claims \ > -host marklogic -port 8884 -username XXX -password XXX -mode local \ > -input_file_path 2015/277ca/ \ > -input_file_type aggregates -aggregate_record_element TRANSACTION \ > -transform_module /transform-in.xquery \ > -transform_function transform-response \ > -transform_namespace http://lambdawerk.com/tx-claims > > The transform-response function looks like this: > > declare function tx-claims:transform-response( > $content as map:map, > $context as map:map > ) as map:map* > { > let $doc := map:get($content, 'value') > let $icn := $doc/TRANSACTION/LOOP2000D/LOOP2200D/TRN/TRN02/text() > let $uri := concat('/responses/', $icn, '.xml') > return > (map:put($content, 'uri', $uri), > $content) > }; > > The mlcp output looks like this at the end: > > 16/07/03 18:59:22 INFO contentpump.LocalJobRunner: completed 76% > 16/07/03 18:59:28 INFO contentpump.LocalJobRunner: completed 77% > 16/07/03 18:59:30 INFO contentpump.LocalJobRunner: completed 78% > 16/07/03 18:59:31 INFO contentpump.LocalJobRunner: completed 80% > 16/07/03 18:59:31 INFO contentpump.LocalJobRunner: completed 81% > 16/07/03 18:59:31 INFO contentpump.LocalJobRunner: > com.marklogic.mapreduce.ContentPumpStats: > 16/07/03 18:59:31 INFO contentpump.LocalJobRunner: INPUT_RECORDS: 1421471 > 16/07/03 18:59:31 INFO contentpump.LocalJobRunner: OUTPUT_RECORDS: 1421471 > 16/07/03 18:59:31 INFO contentpump.LocalJobRunner: > OUTPUT_RECORDS_COMMITTED: 1404192 > 16/07/03 18:59:31 INFO contentpump.LocalJobRunner: OUTPUT_RECORDS_FAILED: 0 > 16/07/03 18:59:31 INFO contentpump.LocalJobRunner: Total execution time: > 3270 sec > > After the load operation completes, nothing unusual is in the ErrroLog.txt > file. However, when I look into the database, I find that some files are > missing. When I load one of the missing files into the database explicitly > (specifying its name as -input_file_path argument), it is correctly loaded. > > Now, the mlcp output looks kind of fishy to me in that i apparently loads > the last 19% of the work in under one second. It seems that it is skipping > a whole bunch of files. It also seems that some output records could not > be written. The manual says that this could be caused by a server-side > transformation, but our function does not seem to be at fault - When I load > the missing file specifying its file name, it is correctly loaded, so it > seems to be something else. > > I would greatly appreciate any ideas or advice. > > Thanks! > Hans > -- LambdaWerk GmbH Oranienburger Straße 87/89 10178 Berlin Phone: +49 30 555 7335 0 Fax: +49 30 555 7335 99 HRB 169991 B Amtsgericht Charlottenburg USt-ID: DE301399951 Geschäftsführer: Hans Hübner http://lambdawerk.com/
_______________________________________________ General mailing list [email protected] Manage your subscription at: http://developer.marklogic.com/mailman/listinfo/general
