David,

Thats are interesting questions.


At this moment i catch Twitterstreams of the Netherlands. Those Twitter
Message wil be automaticly insert into ML Database (Social DB) . I have
placed seperate range indexes ( on element because te messages is in XML
format). Then i created a view so it is easy to get a overview of all
users, messages and locations. But now i won't to integrate it with the
Electronic Program Guide (CSV file) . So i thought maby when i have two
views i can easely integrate and combine it.


The CSV file consist of the following structure

Program;Date From, Date Until; Chanel ; #Usergroep


The question is. How can i do this on the most efficient way ?



Met vriendelijke groeten,

*Erik van der Hoeven *
*Consultant Business Intelligence*

DIKW CONSULTING BV
Einsteinbaan 12
3439 NJ Nieuwegein
M: 06-43029943
E: [email protected]


On Mon, Feb 17, 2014 at 6:49 PM, David Lee <[email protected]> wrote:

>  Yes csv2xml and xmlsh are great for this.
>
> The problem with csv  to xml to ml and with "uber" tools in general is
> that its more complicated than it looks at first sight.
>
> Technically its easy - IF you let the developer decide for you all the
> details.  Which are never what you want.
>
> This is the same issue *exactly* as importing from SQL.
>
>
>
> When you can't to load CSV to ML .. the first step you should think about
> is not how to get CSV into XML ,
>
> but how do I want my document structure to look ?
>
> Does the CSV file become one big XML doc ? One doc per row ? do the values
> go into attributes ? elements ? both ?
>
> Where to get the names of the values ? CSV Headers ? What if those are not
> good  XML names (QNames) ?
>
> Do you need to merge in different data to de-normalize your docs ? (very
> common for CSV to be part of a package of CSV files,
>
> or for it to contain duplicate rows to represent hierarchies) - this
> requires post-processing of the entire result set.
>
>
>
> So first ... think how do you want your final docs to look.  Then think
> ... how to load them.
>
> Is this a one-off small CSV file ? a HUGE file (GB+) ? will it create
> millions of docs or 10s ... how important is it to load fast ?
>
> All these are considerations that take different approaches.
>
>
> OK you figured it all out ... the tools are all there you just need to
> either pick one that by amazing grace picked for you all
>
> the details exactly how you wanted, or you have to glue something together
> to do it your way.
>
>
>
> You could load the CSV to the server then do all the transforming and
> reloading there,
>
> or you can preprocess it just so and then push it to the server exactly
> how you want it.
>
> Both are valid, but I suggest pre-processing the docs is easier and often
> faster ...
>
> but it depends on your skills and tools ... and also the sizes of the data
> ... and what you're doing with it.
>
>
>
> This is what xmlsh excels at.  Instead of trying to do one thing ... it
> lets you split the problem into manageable pieces.
>
>
>
> Once you figured out your document design ... you can glue it together
> with xmlsh
>
>
>
> 1)      Get the CSV into SOME kind of XML.  Nothing fancy but something
> .... so you can use xml tools.
>
> csv2xml has many options to control this ... a common one is
>
>
>
>    csv2xml -header
>
>
>
> This will create a  single rooted document <root> with rows <row> and
> child elements <element> for each cell where the tag names
>
> for the cell are created by converting the header columns into QNames.
>
> It's a reasonable first start ..
>
>
>
> Then suppose you want each row turned into its own document - xsplit to
> the rescue
>
> http://www.xmlsh.org/CommandXsplit
>
>
>
> xsplit is particularly good on this structure document (a root element
> with repeating children)
>
>
>
> By default it will create files with ugly names like x1.xml, x2.xml.
>
> If you want to rename them based on something in the document you could
> then run xmove
>
> http://www.xmlsh.org/CommandXmove
>
>
>
> Now the files probably need some tweeking so you might want to run an xslt
> or xquery on them to fix them up
>
> http://www.xmlsh.org/CommandXslt
>
> http://www.xmlsh.org/CommandXquery
>
>
>
> Now you have a directory of files ready to upload ...
>
> the put command can do this
>
> http://www.xmlsh.org/MarkLogicPut
>
> Or you can use the excellent tool mlcp
>
>
>
> https://developer.marklogic.com/products/mlcp
>
>
>
> so the whole process would look as simple as this
>
>
>
> csv2xml < file.csv | ml:put -uri /myfile.xml
>
>
>
> To something  more realistic
>
>
>
> csv2xml -header  < file.csv  | xslt -f translate.xsl | xsplit  -n -o temp
>
> xmove -x /row/account_id *.xml
>
> ml:put -baseuri /accounts -maxthreads 10 -maxfiles 100 -collection
> mycollect *.xml
>
>
>
> And if you get really fancy you can actually stream this all and avoid
> temporary files, but it's a bit trickier.
>
>
>
>
>
> Amway ... lots of ways to skin the cat !
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
> *From:* [email protected] [mailto:
> [email protected]] *On Behalf Of *Jakob Fix
> *Sent:* Monday, February 17, 2014 11:17 AM
>
> *To:* MarkLogic Developer Discussion
> *Subject:* Re: [MarkLogic Dev General] csv load
>
>
>
> Hi,
>
>
>
> http://www.xmlsh.org/CommandCsv2xml (never used it myself, but it seems
> to do what you're looking for); note though that you would have to add a
> loading task after it which is also available via xmlsh. I'm sure David Lee
> can explain this more eloquently.
>
>
>  cheers,
> Jakob.
>
>
>
> On Mon, Feb 17, 2014 at 5:00 PM, Erik van der Hoeven <
> [email protected]> wrote:
>
> Gentlemen,
>
>
>
> Does any body nows a way to load a csv file into Marklogic Database ?
>
>
>
>
>
>
>   Met vriendelijke groeten/With kind regards,
>
>
>
> *Erik van der Hoeven *
>
> *Consultant Business Intelligence*
>
>
> DIKW CONSULTING BV
> Einsteinbaan 12
> 3439 NJ Nieuwegein
> M: 06-43029943
>
> E: [email protected]
>
>
> _______________________________________________
> General mailing list
> [email protected]
> http://developer.marklogic.com/mailman/listinfo/general
>
>
>
> _______________________________________________
> General mailing list
> [email protected]
> http://developer.marklogic.com/mailman/listinfo/general
>
>
_______________________________________________
General mailing list
[email protected]
http://developer.marklogic.com/mailman/listinfo/general

Reply via email to