Hi, Jakob: I'm glad to hear that mlcp may meet the practical requirement.
> I would still be interested in getting the streaming functionality from Node > into MarkLogic because it would seem like a logical fit. +1 on the potential and the value of the goal. Even if an optimal Node.js implementation with workers, batch assignment, and backpressure would be non-trivial, something simpler would still be useful. Erik Hennum ________________________________ From: [email protected] [[email protected]] on behalf of Jakob Fix [[email protected]] Sent: Tuesday, February 09, 2016 4:21 AM To: MarkLogic Developer Discussion Subject: Re: [MarkLogic Dev General] using node streams to write many documents into database Erik, thanks for the clarification. Actually I was coming from the Node angle this time around and see how I could use their streams capabilities. Yes, I think I'll be able to load the CSV file using mlcp. I'll give that a go. I would still be interested in getting the streaming functionality from Node into MarkLogic because it would seem like a logical fit. Thanks again. cheers, Jakob. On Tue, Feb 9, 2016 at 1:41 AM, Erik Hennum <[email protected]<mailto:[email protected]>> wrote: Hi, Jakob: We don't currently provide an equivalent to mlcp for Node.js A content pump for Node.js might have characteristics similar to the following: * forming documents from input records parsed from one or more input streams * adding documents to batches based on forest assignment * sending batches to the appropriate dnodes with multiple concurrent requests each for multiple worker processes * adding workers or providing backpressure to input streams as needed to maintain optimal throughput While that's possible and would be an interesting challenge, the streaming libraries available on npm and the Node.js client API certainly don't do all of that heavy lifting by themselves. Could mlcp be used for ingestion in your environment? Erik Hennum ________________________________ From: [email protected]<mailto:[email protected]> [[email protected]<mailto:[email protected]>] on behalf of Jakob Fix [[email protected]<mailto:[email protected]>] Sent: Monday, February 08, 2016 3:29 PM To: General Mark Logic Developer Discussion Subject: [MarkLogic Dev General] using node streams to write many documents into database Hi, I've found the documentation that explains how to use a WritableStream to get /one/ document into MarkLogic, but I couldn't find any example where it shows how one could stream /many thousands/ of documents. The idea is to load a CSV file with > 1M lines as a ReadableStream and csv-parse and on each "readable" event to push the corresponding JSON object as a document into MarkLogic. The signature for the db.documents.createWriteStream [1] seems to require a document URI to be present at the time of the stream creation, which I cannot supply at the stage of stream creation. The example given in the documentation on how to load many documents doesn't really scale to "big data proportions" ... [2]. Thanks for any help. cheers, Jakob. [1] https://github.com/marklogic/node-client-api/blob/master/lib/documents.js#L468 [2] http://docs.marklogic.com/guide/node-dev/documents#id_18341 _______________________________________________ General mailing list [email protected]<mailto:[email protected]> Manage your subscription at: http://developer.marklogic.com/mailman/listinfo/general
_______________________________________________ General mailing list [email protected] Manage your subscription at: http://developer.marklogic.com/mailman/listinfo/general
