Erik, thanks for the clarification. Actually I was coming from the Node
angle this time around and see how I could use their streams capabilities.
Yes, I think I'll be able to load the CSV file using mlcp. I'll give that a
go.

I would still be interested in getting the streaming functionality from
Node into MarkLogic because it would seem like a logical fit.

Thanks again.

cheers,
Jakob.

On Tue, Feb 9, 2016 at 1:41 AM, Erik Hennum <[email protected]>
wrote:

> Hi, Jakob:
>
> We don't currently provide an equivalent to mlcp for Node.js
>
> A content pump for Node.js might have characteristics similar to the
> following:
>
> *  forming documents from input records parsed from one or more input
> streams
> *  adding documents to batches based on forest assignment
> *  sending batches to the appropriate dnodes with multiple concurrent
> requests each for multiple worker processes
> *  adding workers or providing backpressure to input streams as needed to
> maintain optimal throughput
>
> While that's possible and would be an interesting challenge, the streaming
> libraries available on npm and the Node.js client API certainly don't do
> all of that heavy lifting by themselves.
>
> Could mlcp be used for ingestion in your environment?
>
>
> Erik Hennum
>
>
> ------------------------------
> *From:* [email protected] [
> [email protected]] on behalf of Jakob Fix [
> [email protected]]
> *Sent:* Monday, February 08, 2016 3:29 PM
> *To:* General Mark Logic Developer Discussion
> *Subject:* [MarkLogic Dev General] using node streams to write many
> documents into database
>
> Hi,
>
> I've found the documentation that explains how to use a WritableStream to
> get /one/ document into MarkLogic, but I couldn't find any example where it
> shows how one could stream /many thousands/ of documents.
>
> The idea is to load a CSV file with > 1M lines as a ReadableStream and
> csv-parse and on each "readable" event to push the corresponding JSON
> object as a document into MarkLogic.
>
> The signature for the db.documents.createWriteStream [1] seems to require
> a document URI to be present at the time of the stream creation, which I
> cannot supply at the stage of stream creation. The example given in the
> documentation on how to load many documents doesn't really scale to "big
> data proportions" ... [2].
>
> Thanks for any help.
>
> cheers,
> Jakob.
>
> [1]
> https://github.com/marklogic/node-client-api/blob/master/lib/documents.js#L468
> [2] http://docs.marklogic.com/guide/node-dev/documents#id_18341
>
> _______________________________________________
> General mailing list
> [email protected]
> Manage your subscription at:
> http://developer.marklogic.com/mailman/listinfo/general
>
>
_______________________________________________
General mailing list
[email protected]
Manage your subscription at: 
http://developer.marklogic.com/mailman/listinfo/general

Reply via email to