Yes csv2xml and xmlsh are great for this. The problem with csv to xml to ml and with "uber" tools in general is that its more complicated than it looks at first sight. Technically its easy - IF you let the developer decide for you all the details. Which are never what you want. This is the same issue *exactly* as importing from SQL.
When you can't to load CSV to ML .. the first step you should think about is not how to get CSV into XML , but how do I want my document structure to look ? Does the CSV file become one big XML doc ? One doc per row ? do the values go into attributes ? elements ? both ? Where to get the names of the values ? CSV Headers ? What if those are not good XML names (QNames) ? Do you need to merge in different data to de-normalize your docs ? (very common for CSV to be part of a package of CSV files, or for it to contain duplicate rows to represent hierarchies) - this requires post-processing of the entire result set. So first ... think how do you want your final docs to look. Then think ... how to load them. Is this a one-off small CSV file ? a HUGE file (GB+) ? will it create millions of docs or 10s ... how important is it to load fast ? All these are considerations that take different approaches. OK you figured it all out ... the tools are all there you just need to either pick one that by amazing grace picked for you all the details exactly how you wanted, or you have to glue something together to do it your way. You could load the CSV to the server then do all the transforming and reloading there, or you can preprocess it just so and then push it to the server exactly how you want it. Both are valid, but I suggest pre-processing the docs is easier and often faster ... but it depends on your skills and tools ... and also the sizes of the data ... and what you're doing with it. This is what xmlsh excels at. Instead of trying to do one thing ... it lets you split the problem into manageable pieces. Once you figured out your document design ... you can glue it together with xmlsh 1) Get the CSV into SOME kind of XML. Nothing fancy but something .... so you can use xml tools. csv2xml has many options to control this ... a common one is csv2xml -header This will create a single rooted document <root> with rows <row> and child elements <element> for each cell where the tag names for the cell are created by converting the header columns into QNames. It's a reasonable first start .. Then suppose you want each row turned into its own document - xsplit to the rescue http://www.xmlsh.org/CommandXsplit xsplit is particularly good on this structure document (a root element with repeating children) By default it will create files with ugly names like x1.xml, x2.xml. If you want to rename them based on something in the document you could then run xmove http://www.xmlsh.org/CommandXmove Now the files probably need some tweeking so you might want to run an xslt or xquery on them to fix them up http://www.xmlsh.org/CommandXslt http://www.xmlsh.org/CommandXquery Now you have a directory of files ready to upload ... the put command can do this http://www.xmlsh.org/MarkLogicPut Or you can use the excellent tool mlcp https://developer.marklogic.com/products/mlcp so the whole process would look as simple as this csv2xml < file.csv | ml:put -uri /myfile.xml To something more realistic csv2xml -header < file.csv | xslt -f translate.xsl | xsplit -n -o temp xmove -x /row/account_id *.xml ml:put -baseuri /accounts -maxthreads 10 -maxfiles 100 -collection mycollect *.xml And if you get really fancy you can actually stream this all and avoid temporary files, but it's a bit trickier. Amway ... lots of ways to skin the cat ! From: [email protected] [mailto:[email protected]] On Behalf Of Jakob Fix Sent: Monday, February 17, 2014 11:17 AM To: MarkLogic Developer Discussion Subject: Re: [MarkLogic Dev General] csv load Hi, http://www.xmlsh.org/CommandCsv2xml (never used it myself, but it seems to do what you're looking for); note though that you would have to add a loading task after it which is also available via xmlsh. I'm sure David Lee can explain this more eloquently. cheers, Jakob. On Mon, Feb 17, 2014 at 5:00 PM, Erik van der Hoeven <[email protected]<mailto:[email protected]>> wrote: Gentlemen, Does any body nows a way to load a csv file into Marklogic Database ? Met vriendelijke groeten/With kind regards, Erik van der Hoeven Consultant Business Intelligence DIKW CONSULTING BV Einsteinbaan 12 3439 NJ Nieuwegein M: 06-43029943 E: [email protected]<mailto:[email protected]> _______________________________________________ General mailing list [email protected]<mailto:[email protected]> http://developer.marklogic.com/mailman/listinfo/general
_______________________________________________ General mailing list [email protected] http://developer.marklogic.com/mailman/listinfo/general
