[MarkLogic Dev General] Evaluation: Loading large csv files via mlcp

Jörg Teubert Wed, 20 Apr 2016 02:48:46 -0700

Dear Users and Developers,

I'm a starter with Marklogic and currently working on an evaluation of ML
for use in our business. It is needed to import periodicly large csv files
of data provided by government in USA. The file contain a headline with the
names of the fields and then many records with the content.


Currently we are not really happy with the executuion time of a the import
via mlcp.
Some info to the file to import:
- ca 5 million records
- 329 fields per record
- about 5 GB of data

At the import one xml document per record is created in ML. Functional it
looks fine so far. I used a mlcp call like this:
>mlcp-8.0-5/bin/mlcp.sh import -host marklogic -port 8383 -username user
-password password -input_file_type delimited_text -document_type xml
-delimited_root_name root -delimiter "," -output_uri_prefix /testImport/
-output_uri_suffix .xml -input_file_path largeFile.csv

An import with this runs around 4 hours and 45 minutes which is to much in
our expectation.
My questions are now:
- Are there any options for mlcp to decrease the execution time? I tried
some options like splitting the input file like described in the user guide
but without increasing performance. More the opposite.
- How are the experiences with such large csv file to import into ML?
- Are there other possibilities to import such large files?

I think I need a hint in this case.
If other informations needed please let me know.

Thank you in advance.
Best Regards,
Jörg Teubert

_______________________________________________
General mailing list
[email protected]
Manage your subscription at: 
http://developer.marklogic.com/mailman/listinfo/general

[MarkLogic Dev General] Evaluation: Loading large csv files via mlcp

Reply via email to