Dear Users and Developers, I'm a starter with Marklogic and currently working on an evaluation of ML for use in our business. It is needed to import periodicly large csv files of data provided by government in USA. The file contain a headline with the names of the fields and then many records with the content.
Currently we are not really happy with the executuion time of a the import via mlcp. Some info to the file to import: - ca 5 million records - 329 fields per record - about 5 GB of data At the import one xml document per record is created in ML. Functional it looks fine so far. I used a mlcp call like this: >mlcp-8.0-5/bin/mlcp.sh import -host marklogic -port 8383 -username user -password password -input_file_type delimited_text -document_type xml -delimited_root_name root -delimiter "," -output_uri_prefix /testImport/ -output_uri_suffix .xml -input_file_path largeFile.csv An import with this runs around 4 hours and 45 minutes which is to much in our expectation. My questions are now: - Are there any options for mlcp to decrease the execution time? I tried some options like splitting the input file like described in the user guide but without increasing performance. More the opposite. - How are the experiences with such large csv file to import into ML? - Are there other possibilities to import such large files? I think I need a hint in this case. If other informations needed please let me know. Thank you in advance. Best Regards, Jörg Teubert
_______________________________________________ General mailing list [email protected] Manage your subscription at: http://developer.marklogic.com/mailman/listinfo/general
