Hi,
I'm trying to load a bunch of files into MarkLogic using mlcp, but for some
reason, it seems that it skips some of the files. I'm using a command line
like this:
mlcp.sh import \
-database tx-claims \
-host marklogic -port 8884 -username XXX -password XXX -mode local \
-input_file_path 2015/277ca/ \
-input_file_type aggregates -aggregate_record_element TRANSACTION \
-transform_module /transform-in.xquery \
-transform_function transform-response \
-transform_namespace http://lambdawerk.com/tx-claims
The transform-response function looks like this:
declare function tx-claims:transform-response(
$content as map:map,
$context as map:map
) as map:map*
{
let $doc := map:get($content, 'value')
let $icn := $doc/TRANSACTION/LOOP2000D/LOOP2200D/TRN/TRN02/text()
let $uri := concat('/responses/', $icn, '.xml')
return
(map:put($content, 'uri', $uri),
$content)
};
The mlcp output looks like this at the end:
16/07/03 18:59:22 INFO contentpump.LocalJobRunner: completed 76%
16/07/03 18:59:28 INFO contentpump.LocalJobRunner: completed 77%
16/07/03 18:59:30 INFO contentpump.LocalJobRunner: completed 78%
16/07/03 18:59:31 INFO contentpump.LocalJobRunner: completed 80%
16/07/03 18:59:31 INFO contentpump.LocalJobRunner: completed 81%
16/07/03 18:59:31 INFO contentpump.LocalJobRunner:
com.marklogic.mapreduce.ContentPumpStats:
16/07/03 18:59:31 INFO contentpump.LocalJobRunner: INPUT_RECORDS: 1421471
16/07/03 18:59:31 INFO contentpump.LocalJobRunner: OUTPUT_RECORDS: 1421471
16/07/03 18:59:31 INFO contentpump.LocalJobRunner:
OUTPUT_RECORDS_COMMITTED: 1404192
16/07/03 18:59:31 INFO contentpump.LocalJobRunner: OUTPUT_RECORDS_FAILED: 0
16/07/03 18:59:31 INFO contentpump.LocalJobRunner: Total execution time:
3270 sec
After the load operation completes, nothing unusual is in the ErrroLog.txt
file. However, when I look into the database, I find that some files are
missing. When I load one of the missing files into the database explicitly
(specifying its name as -input_file_path argument), it is correctly loaded.
Now, the mlcp output looks kind of fishy to me in that i apparently loads
the last 19% of the work in under one second. It seems that it is skipping
a whole bunch of files. It also seems that some output records could not
be written. The manual says that this could be caused by a server-side
transformation, but our function does not seem to be at fault - When I load
the missing file specifying its file name, it is correctly loaded, so it
seems to be something else.
I would greatly appreciate any ideas or advice.
Thanks!
Hans
_______________________________________________
General mailing list
[email protected]
Manage your subscription at:
http://developer.marklogic.com/mailman/listinfo/general