Yes csv2xml and xmlsh are great for this.
The problem with csv  to xml to ml and with "uber" tools in general is that its 
more complicated than it looks at first sight.
Technically its easy - IF you let the developer decide for you all the details. 
 Which are never what you want.
This is the same issue *exactly* as importing from SQL.

When you can't to load CSV to ML .. the first step you should think about is 
not how to get CSV into XML ,
but how do I want my document structure to look ?
Does the CSV file become one big XML doc ? One doc per row ? do the values go 
into attributes ? elements ? both ?
Where to get the names of the values ? CSV Headers ? What if those are not good 
 XML names (QNames) ?
Do you need to merge in different data to de-normalize your docs ? (very common 
for CSV to be part of a package of CSV files,
or for it to contain duplicate rows to represent hierarchies) - this requires 
post-processing of the entire result set.

So first ... think how do you want your final docs to look.  Then think ... how 
to load them.
Is this a one-off small CSV file ? a HUGE file (GB+) ? will it create millions 
of docs or 10s ... how important is it to load fast ?
All these are considerations that take different approaches.

OK you figured it all out ... the tools are all there you just need to either 
pick one that by amazing grace picked for you all
the details exactly how you wanted, or you have to glue something together to 
do it your way.

You could load the CSV to the server then do all the transforming and reloading 
there,
or you can preprocess it just so and then push it to the server exactly how you 
want it.
Both are valid, but I suggest pre-processing the docs is easier and often 
faster ...
but it depends on your skills and tools ... and also the sizes of the data ... 
and what you're doing with it.

This is what xmlsh excels at.  Instead of trying to do one thing ... it lets 
you split the problem into manageable pieces.

Once you figured out your document design ... you can glue it together with 
xmlsh


1)      Get the CSV into SOME kind of XML.  Nothing fancy but something .... so 
you can use xml tools.
csv2xml has many options to control this ... a common one is

   csv2xml -header

This will create a  single rooted document <root> with rows <row> and child 
elements <element> for each cell where the tag names
for the cell are created by converting the header columns into QNames.
It's a reasonable first start ..

Then suppose you want each row turned into its own document - xsplit to the 
rescue
http://www.xmlsh.org/CommandXsplit

xsplit is particularly good on this structure document (a root element with 
repeating children)

By default it will create files with ugly names like x1.xml, x2.xml.
If you want to rename them based on something in the document you could then 
run xmove
http://www.xmlsh.org/CommandXmove

Now the files probably need some tweeking so you might want to run an xslt or 
xquery on them to fix them up
http://www.xmlsh.org/CommandXslt
http://www.xmlsh.org/CommandXquery

Now you have a directory of files ready to upload ...
the put command can do this
http://www.xmlsh.org/MarkLogicPut
Or you can use the excellent tool mlcp

https://developer.marklogic.com/products/mlcp

so the whole process would look as simple as this

csv2xml < file.csv | ml:put -uri /myfile.xml

To something  more realistic

csv2xml -header  < file.csv  | xslt -f translate.xsl | xsplit  -n -o temp
xmove -x /row/account_id *.xml
ml:put -baseuri /accounts -maxthreads 10 -maxfiles 100 -collection mycollect 
*.xml

And if you get really fancy you can actually stream this all and avoid 
temporary files, but it's a bit trickier.


Amway ... lots of ways to skin the cat !










From: [email protected] 
[mailto:[email protected]] On Behalf Of Jakob Fix
Sent: Monday, February 17, 2014 11:17 AM
To: MarkLogic Developer Discussion
Subject: Re: [MarkLogic Dev General] csv load

Hi,

http://www.xmlsh.org/CommandCsv2xml (never used it myself, but it seems to do 
what you're looking for); note though that you would have to add a loading task 
after it which is also available via xmlsh. I'm sure David Lee can explain this 
more eloquently.

cheers,
Jakob.

On Mon, Feb 17, 2014 at 5:00 PM, Erik van der Hoeven 
<[email protected]<mailto:[email protected]>> wrote:
Gentlemen,

Does any body nows a way to load a csv file into Marklogic Database ?



Met vriendelijke groeten/With kind regards,

Erik van der Hoeven
Consultant Business Intelligence

DIKW CONSULTING BV
Einsteinbaan 12
3439 NJ Nieuwegein
M: 06-43029943
E: [email protected]<mailto:[email protected]>

_______________________________________________
General mailing list
[email protected]<mailto:[email protected]>
http://developer.marklogic.com/mailman/listinfo/general

_______________________________________________
General mailing list
[email protected]
http://developer.marklogic.com/mailman/listinfo/general

Reply via email to