Hi, Jakob:

I'm glad to hear that mlcp may meet the practical requirement.

> I would still be interested in getting the streaming functionality from Node 
> into MarkLogic because it would seem like a logical fit.

+1 on the potential and the value of the goal.

Even if an optimal Node.js implementation with workers, batch assignment, and 
backpressure would be non-trivial, something simpler would still be useful.


Erik Hennum

________________________________
From: [email protected] 
[[email protected]] on behalf of Jakob Fix 
[[email protected]]
Sent: Tuesday, February 09, 2016 4:21 AM
To: MarkLogic Developer Discussion
Subject: Re: [MarkLogic Dev General] using node streams to write many documents 
into database

Erik, thanks for the clarification. Actually I was coming from the Node angle 
this time around and see how I could use their streams capabilities. Yes, I 
think I'll be able to load the CSV file using mlcp. I'll give that a go.

I would still be interested in getting the streaming functionality from Node 
into MarkLogic because it would seem like a logical fit.

Thanks again.

cheers,
Jakob.

On Tue, Feb 9, 2016 at 1:41 AM, Erik Hennum 
<[email protected]<mailto:[email protected]>> wrote:
Hi, Jakob:

We don't currently provide an equivalent to mlcp for Node.js

A content pump for Node.js might have characteristics similar to the following:

*  forming documents from input records parsed from one or more input streams
*  adding documents to batches based on forest assignment
*  sending batches to the appropriate dnodes with multiple concurrent requests 
each for multiple worker processes
*  adding workers or providing backpressure to input streams as needed to 
maintain optimal throughput

While that's possible and would be an interesting challenge, the streaming 
libraries available on npm and the Node.js client API certainly don't do all of 
that heavy lifting by themselves.

Could mlcp be used for ingestion in your environment?


Erik Hennum


________________________________
From: 
[email protected]<mailto:[email protected]>
 
[[email protected]<mailto:[email protected]>]
 on behalf of Jakob Fix [[email protected]<mailto:[email protected]>]
Sent: Monday, February 08, 2016 3:29 PM
To: General Mark Logic Developer Discussion
Subject: [MarkLogic Dev General] using node streams to write many documents 
into database

Hi,

I've found the documentation that explains how to use a WritableStream to get 
/one/ document into MarkLogic, but I couldn't find any example where it shows 
how one could stream /many thousands/ of documents.

The idea is to load a CSV file with > 1M lines as a ReadableStream and 
csv-parse and on each "readable" event to push the corresponding JSON object as 
a document into MarkLogic.

The signature for the db.documents.createWriteStream [1] seems to require a 
document URI to be present at the time of the stream creation, which I cannot 
supply at the stage of stream creation. The example given in the documentation 
on how to load many documents doesn't really scale to "big data proportions" 
... [2].

Thanks for any help.

cheers,
Jakob.

[1]  
https://github.com/marklogic/node-client-api/blob/master/lib/documents.js#L468
[2] http://docs.marklogic.com/guide/node-dev/documents#id_18341

_______________________________________________
General mailing list
[email protected]<mailto:[email protected]>
Manage your subscription at:
http://developer.marklogic.com/mailman/listinfo/general


_______________________________________________
General mailing list
[email protected]
Manage your subscription at: 
http://developer.marklogic.com/mailman/listinfo/general

Reply via email to