Erik, thanks for the clarification. Actually I was coming from the Node angle this time around and see how I could use their streams capabilities. Yes, I think I'll be able to load the CSV file using mlcp. I'll give that a go.
I would still be interested in getting the streaming functionality from Node into MarkLogic because it would seem like a logical fit. Thanks again. cheers, Jakob. On Tue, Feb 9, 2016 at 1:41 AM, Erik Hennum <[email protected]> wrote: > Hi, Jakob: > > We don't currently provide an equivalent to mlcp for Node.js > > A content pump for Node.js might have characteristics similar to the > following: > > * forming documents from input records parsed from one or more input > streams > * adding documents to batches based on forest assignment > * sending batches to the appropriate dnodes with multiple concurrent > requests each for multiple worker processes > * adding workers or providing backpressure to input streams as needed to > maintain optimal throughput > > While that's possible and would be an interesting challenge, the streaming > libraries available on npm and the Node.js client API certainly don't do > all of that heavy lifting by themselves. > > Could mlcp be used for ingestion in your environment? > > > Erik Hennum > > > ------------------------------ > *From:* [email protected] [ > [email protected]] on behalf of Jakob Fix [ > [email protected]] > *Sent:* Monday, February 08, 2016 3:29 PM > *To:* General Mark Logic Developer Discussion > *Subject:* [MarkLogic Dev General] using node streams to write many > documents into database > > Hi, > > I've found the documentation that explains how to use a WritableStream to > get /one/ document into MarkLogic, but I couldn't find any example where it > shows how one could stream /many thousands/ of documents. > > The idea is to load a CSV file with > 1M lines as a ReadableStream and > csv-parse and on each "readable" event to push the corresponding JSON > object as a document into MarkLogic. > > The signature for the db.documents.createWriteStream [1] seems to require > a document URI to be present at the time of the stream creation, which I > cannot supply at the stage of stream creation. The example given in the > documentation on how to load many documents doesn't really scale to "big > data proportions" ... [2]. > > Thanks for any help. > > cheers, > Jakob. > > [1] > https://github.com/marklogic/node-client-api/blob/master/lib/documents.js#L468 > [2] http://docs.marklogic.com/guide/node-dev/documents#id_18341 > > _______________________________________________ > General mailing list > [email protected] > Manage your subscription at: > http://developer.marklogic.com/mailman/listinfo/general > >
_______________________________________________ General mailing list [email protected] Manage your subscription at: http://developer.marklogic.com/mailman/listinfo/general
