Hi,

just to let you know:  The problem that I had was entirely caused by the
fact that I was loading files in parallel that depended on each other, by
the way of the loader transformation that I've posted.  The mlcp percentage
display is still confusing, though, as it apparently shows the percentage
of the input data that was loaded into the database, not the number of
records read from the input.  That could be improved, I think but it does
not seem to be very important.

Thank you Indy and Geert for looking at this!
-Hans

On Sun, Jul 3, 2016 at 7:52 PM, Hans Hübner <[email protected]>
wrote:

> Hi,
>
> I'm trying to load a bunch of files into MarkLogic using mlcp, but for
> some reason, it seems that it skips some of the files.  I'm using a command
> line like this:
>
> mlcp.sh import \
>      -database tx-claims \
>      -host marklogic -port 8884 -username XXX -password XXX -mode local \
>      -input_file_path 2015/277ca/ \
>      -input_file_type aggregates -aggregate_record_element TRANSACTION \
>      -transform_module /transform-in.xquery \
>      -transform_function transform-response \
>      -transform_namespace http://lambdawerk.com/tx-claims
>
> The transform-response function looks like this:
>
> declare function tx-claims:transform-response(
> $content as map:map,
> $context as map:map
> ) as map:map*
> {
>   let $doc := map:get($content, 'value')
>   let $icn := $doc/TRANSACTION/LOOP2000D/LOOP2200D/TRN/TRN02/text()
>   let $uri := concat('/responses/', $icn, '.xml')
>   return
>     (map:put($content, 'uri', $uri),
>     $content)
> };
>
> The mlcp output looks like this at the end:
>
> 16/07/03 18:59:22 INFO contentpump.LocalJobRunner:  completed 76%
> 16/07/03 18:59:28 INFO contentpump.LocalJobRunner:  completed 77%
> 16/07/03 18:59:30 INFO contentpump.LocalJobRunner:  completed 78%
> 16/07/03 18:59:31 INFO contentpump.LocalJobRunner:  completed 80%
> 16/07/03 18:59:31 INFO contentpump.LocalJobRunner:  completed 81%
> 16/07/03 18:59:31 INFO contentpump.LocalJobRunner:
> com.marklogic.mapreduce.ContentPumpStats:
> 16/07/03 18:59:31 INFO contentpump.LocalJobRunner: INPUT_RECORDS: 1421471
> 16/07/03 18:59:31 INFO contentpump.LocalJobRunner: OUTPUT_RECORDS: 1421471
> 16/07/03 18:59:31 INFO contentpump.LocalJobRunner:
> OUTPUT_RECORDS_COMMITTED: 1404192
> 16/07/03 18:59:31 INFO contentpump.LocalJobRunner: OUTPUT_RECORDS_FAILED: 0
> 16/07/03 18:59:31 INFO contentpump.LocalJobRunner: Total execution time:
> 3270 sec
>
> After the load operation completes, nothing unusual is in the ErrroLog.txt
> file.  However, when I look into the database, I find that some files are
> missing.  When I load one of the missing files into the database explicitly
> (specifying its name as -input_file_path argument), it is correctly loaded.
>
> Now, the mlcp output looks kind of fishy to me in that i apparently loads
> the last 19% of the work in under one second.  It seems that it is skipping
> a whole bunch of files.  It also seems that some output records could not
> be written.  The manual says that this could be caused by a server-side
> transformation, but our function does not seem to be at fault - When I load
> the missing file specifying its file name, it is correctly loaded, so it
> seems to be something else.
>
> I would greatly appreciate any ideas or advice.
>
> Thanks!
> Hans
>



-- 
LambdaWerk GmbH
Oranienburger Straße 87/89
10178 Berlin
Phone: +49 30 555 7335 0
Fax: +49 30 555 7335 99

HRB 169991 B Amtsgericht Charlottenburg
USt-ID: DE301399951
Geschäftsführer:  Hans Hübner

http://lambdawerk.com/
_______________________________________________
General mailing list
[email protected]
Manage your subscription at: 
http://developer.marklogic.com/mailman/listinfo/general

Reply via email to