Hey Hans, Not completely sure however could you try to load files again by defining thread count properties equal to 1 in your mlcp import config file and see the results?
-thread_count 1 Regards, Indy On Sun, Jul 3, 2016 at 11:22 PM, Hans Hübner <[email protected]> wrote: > Hi, > > I'm trying to load a bunch of files into MarkLogic using mlcp, but for > some reason, it seems that it skips some of the files. I'm using a command > line like this: > > mlcp.sh import \ > -database tx-claims \ > -host marklogic -port 8884 -username XXX -password XXX -mode local \ > -input_file_path 2015/277ca/ \ > -input_file_type aggregates -aggregate_record_element TRANSACTION \ > -transform_module /transform-in.xquery \ > -transform_function transform-response \ > -transform_namespace http://lambdawerk.com/tx-claims > > The transform-response function looks like this: > > declare function tx-claims:transform-response( > $content as map:map, > $context as map:map > ) as map:map* > { > let $doc := map:get($content, 'value') > let $icn := $doc/TRANSACTION/LOOP2000D/LOOP2200D/TRN/TRN02/text() > let $uri := concat('/responses/', $icn, '.xml') > return > (map:put($content, 'uri', $uri), > $content) > }; > > The mlcp output looks like this at the end: > > 16/07/03 18:59:22 INFO contentpump.LocalJobRunner: completed 76% > 16/07/03 18:59:28 INFO contentpump.LocalJobRunner: completed 77% > 16/07/03 18:59:30 INFO contentpump.LocalJobRunner: completed 78% > 16/07/03 18:59:31 INFO contentpump.LocalJobRunner: completed 80% > 16/07/03 18:59:31 INFO contentpump.LocalJobRunner: completed 81% > 16/07/03 18:59:31 INFO contentpump.LocalJobRunner: > com.marklogic.mapreduce.ContentPumpStats: > 16/07/03 18:59:31 INFO contentpump.LocalJobRunner: INPUT_RECORDS: 1421471 > 16/07/03 18:59:31 INFO contentpump.LocalJobRunner: OUTPUT_RECORDS: 1421471 > 16/07/03 18:59:31 INFO contentpump.LocalJobRunner: > OUTPUT_RECORDS_COMMITTED: 1404192 > 16/07/03 18:59:31 INFO contentpump.LocalJobRunner: OUTPUT_RECORDS_FAILED: 0 > 16/07/03 18:59:31 INFO contentpump.LocalJobRunner: Total execution time: > 3270 sec > > After the load operation completes, nothing unusual is in the ErrroLog.txt > file. However, when I look into the database, I find that some files are > missing. When I load one of the missing files into the database explicitly > (specifying its name as -input_file_path argument), it is correctly loaded. > > Now, the mlcp output looks kind of fishy to me in that i apparently loads > the last 19% of the work in under one second. It seems that it is skipping > a whole bunch of files. It also seems that some output records could not > be written. The manual says that this could be caused by a server-side > transformation, but our function does not seem to be at fault - When I load > the missing file specifying its file name, it is correctly loaded, so it > seems to be something else. > > I would greatly appreciate any ideas or advice. > > Thanks! > Hans > > _______________________________________________ > General mailing list > [email protected] > Manage your subscription at: > http://developer.marklogic.com/mailman/listinfo/general > >
_______________________________________________ General mailing list [email protected] Manage your subscription at: http://developer.marklogic.com/mailman/listinfo/general
