Hey Hans,

Not completely sure however could you try to load files again by defining
thread count properties equal to 1 in your mlcp import config file and see
the results?

-thread_count
1

Regards,
Indy

On Sun, Jul 3, 2016 at 11:22 PM, Hans Hübner <[email protected]>
wrote:

> Hi,
>
> I'm trying to load a bunch of files into MarkLogic using mlcp, but for
> some reason, it seems that it skips some of the files.  I'm using a command
> line like this:
>
> mlcp.sh import \
>      -database tx-claims \
>      -host marklogic -port 8884 -username XXX -password XXX -mode local \
>      -input_file_path 2015/277ca/ \
>      -input_file_type aggregates -aggregate_record_element TRANSACTION \
>      -transform_module /transform-in.xquery \
>      -transform_function transform-response \
>      -transform_namespace http://lambdawerk.com/tx-claims
>
> The transform-response function looks like this:
>
> declare function tx-claims:transform-response(
> $content as map:map,
> $context as map:map
> ) as map:map*
> {
>   let $doc := map:get($content, 'value')
>   let $icn := $doc/TRANSACTION/LOOP2000D/LOOP2200D/TRN/TRN02/text()
>   let $uri := concat('/responses/', $icn, '.xml')
>   return
>     (map:put($content, 'uri', $uri),
>     $content)
> };
>
> The mlcp output looks like this at the end:
>
> 16/07/03 18:59:22 INFO contentpump.LocalJobRunner:  completed 76%
> 16/07/03 18:59:28 INFO contentpump.LocalJobRunner:  completed 77%
> 16/07/03 18:59:30 INFO contentpump.LocalJobRunner:  completed 78%
> 16/07/03 18:59:31 INFO contentpump.LocalJobRunner:  completed 80%
> 16/07/03 18:59:31 INFO contentpump.LocalJobRunner:  completed 81%
> 16/07/03 18:59:31 INFO contentpump.LocalJobRunner:
> com.marklogic.mapreduce.ContentPumpStats:
> 16/07/03 18:59:31 INFO contentpump.LocalJobRunner: INPUT_RECORDS: 1421471
> 16/07/03 18:59:31 INFO contentpump.LocalJobRunner: OUTPUT_RECORDS: 1421471
> 16/07/03 18:59:31 INFO contentpump.LocalJobRunner:
> OUTPUT_RECORDS_COMMITTED: 1404192
> 16/07/03 18:59:31 INFO contentpump.LocalJobRunner: OUTPUT_RECORDS_FAILED: 0
> 16/07/03 18:59:31 INFO contentpump.LocalJobRunner: Total execution time:
> 3270 sec
>
> After the load operation completes, nothing unusual is in the ErrroLog.txt
> file.  However, when I look into the database, I find that some files are
> missing.  When I load one of the missing files into the database explicitly
> (specifying its name as -input_file_path argument), it is correctly loaded.
>
> Now, the mlcp output looks kind of fishy to me in that i apparently loads
> the last 19% of the work in under one second.  It seems that it is skipping
> a whole bunch of files.  It also seems that some output records could not
> be written.  The manual says that this could be caused by a server-side
> transformation, but our function does not seem to be at fault - When I load
> the missing file specifying its file name, it is correctly loaded, so it
> seems to be something else.
>
> I would greatly appreciate any ideas or advice.
>
> Thanks!
> Hans
>
> _______________________________________________
> General mailing list
> [email protected]
> Manage your subscription at:
> http://developer.marklogic.com/mailman/listinfo/general
>
>
_______________________________________________
General mailing list
[email protected]
Manage your subscription at: 
http://developer.marklogic.com/mailman/listinfo/general

Reply via email to