David,
Thats are interesting questions. At this moment i catch Twitterstreams of the Netherlands. Those Twitter Message wil be automaticly insert into ML Database (Social DB) . I have placed seperate range indexes ( on element because te messages is in XML format). Then i created a view so it is easy to get a overview of all users, messages and locations. But now i won't to integrate it with the Electronic Program Guide (CSV file) . So i thought maby when i have two views i can easely integrate and combine it. The CSV file consist of the following structure Program;Date From, Date Until; Chanel ; #Usergroep The question is. How can i do this on the most efficient way ? Met vriendelijke groeten, *Erik van der Hoeven * *Consultant Business Intelligence* DIKW CONSULTING BV Einsteinbaan 12 3439 NJ Nieuwegein M: 06-43029943 E: [email protected] On Mon, Feb 17, 2014 at 6:49 PM, David Lee <[email protected]> wrote: > Yes csv2xml and xmlsh are great for this. > > The problem with csv to xml to ml and with "uber" tools in general is > that its more complicated than it looks at first sight. > > Technically its easy - IF you let the developer decide for you all the > details. Which are never what you want. > > This is the same issue *exactly* as importing from SQL. > > > > When you can't to load CSV to ML .. the first step you should think about > is not how to get CSV into XML , > > but how do I want my document structure to look ? > > Does the CSV file become one big XML doc ? One doc per row ? do the values > go into attributes ? elements ? both ? > > Where to get the names of the values ? CSV Headers ? What if those are not > good XML names (QNames) ? > > Do you need to merge in different data to de-normalize your docs ? (very > common for CSV to be part of a package of CSV files, > > or for it to contain duplicate rows to represent hierarchies) - this > requires post-processing of the entire result set. > > > > So first ... think how do you want your final docs to look. Then think > ... how to load them. > > Is this a one-off small CSV file ? a HUGE file (GB+) ? will it create > millions of docs or 10s ... how important is it to load fast ? > > All these are considerations that take different approaches. > > > OK you figured it all out ... the tools are all there you just need to > either pick one that by amazing grace picked for you all > > the details exactly how you wanted, or you have to glue something together > to do it your way. > > > > You could load the CSV to the server then do all the transforming and > reloading there, > > or you can preprocess it just so and then push it to the server exactly > how you want it. > > Both are valid, but I suggest pre-processing the docs is easier and often > faster ... > > but it depends on your skills and tools ... and also the sizes of the data > ... and what you're doing with it. > > > > This is what xmlsh excels at. Instead of trying to do one thing ... it > lets you split the problem into manageable pieces. > > > > Once you figured out your document design ... you can glue it together > with xmlsh > > > > 1) Get the CSV into SOME kind of XML. Nothing fancy but something > .... so you can use xml tools. > > csv2xml has many options to control this ... a common one is > > > > csv2xml -header > > > > This will create a single rooted document <root> with rows <row> and > child elements <element> for each cell where the tag names > > for the cell are created by converting the header columns into QNames. > > It's a reasonable first start .. > > > > Then suppose you want each row turned into its own document - xsplit to > the rescue > > http://www.xmlsh.org/CommandXsplit > > > > xsplit is particularly good on this structure document (a root element > with repeating children) > > > > By default it will create files with ugly names like x1.xml, x2.xml. > > If you want to rename them based on something in the document you could > then run xmove > > http://www.xmlsh.org/CommandXmove > > > > Now the files probably need some tweeking so you might want to run an xslt > or xquery on them to fix them up > > http://www.xmlsh.org/CommandXslt > > http://www.xmlsh.org/CommandXquery > > > > Now you have a directory of files ready to upload ... > > the put command can do this > > http://www.xmlsh.org/MarkLogicPut > > Or you can use the excellent tool mlcp > > > > https://developer.marklogic.com/products/mlcp > > > > so the whole process would look as simple as this > > > > csv2xml < file.csv | ml:put -uri /myfile.xml > > > > To something more realistic > > > > csv2xml -header < file.csv | xslt -f translate.xsl | xsplit -n -o temp > > xmove -x /row/account_id *.xml > > ml:put -baseuri /accounts -maxthreads 10 -maxfiles 100 -collection > mycollect *.xml > > > > And if you get really fancy you can actually stream this all and avoid > temporary files, but it's a bit trickier. > > > > > > Amway ... lots of ways to skin the cat ! > > > > > > > > > > > > > > > > > > > > > > *From:* [email protected] [mailto: > [email protected]] *On Behalf Of *Jakob Fix > *Sent:* Monday, February 17, 2014 11:17 AM > > *To:* MarkLogic Developer Discussion > *Subject:* Re: [MarkLogic Dev General] csv load > > > > Hi, > > > > http://www.xmlsh.org/CommandCsv2xml (never used it myself, but it seems > to do what you're looking for); note though that you would have to add a > loading task after it which is also available via xmlsh. I'm sure David Lee > can explain this more eloquently. > > > cheers, > Jakob. > > > > On Mon, Feb 17, 2014 at 5:00 PM, Erik van der Hoeven < > [email protected]> wrote: > > Gentlemen, > > > > Does any body nows a way to load a csv file into Marklogic Database ? > > > > > > > Met vriendelijke groeten/With kind regards, > > > > *Erik van der Hoeven * > > *Consultant Business Intelligence* > > > DIKW CONSULTING BV > Einsteinbaan 12 > 3439 NJ Nieuwegein > M: 06-43029943 > > E: [email protected] > > > _______________________________________________ > General mailing list > [email protected] > http://developer.marklogic.com/mailman/listinfo/general > > > > _______________________________________________ > General mailing list > [email protected] > http://developer.marklogic.com/mailman/listinfo/general > >
_______________________________________________ General mailing list [email protected] http://developer.marklogic.com/mailman/listinfo/general
