We've written a writeStream that is updating product information in a 
Postgres database, and would like some advice on managing how many database 
writes we can handle concurrently.

The complete stream reads a CSV file which pipes to a transform stream 
converting multiple rows into a product family (one product with many sizes 
and colors), and then pipes to the writeStream which writes to the 
database. The writeStream calls an Import function that does multiple reads 
and writes to determine whether the product family exists, needs to be 
updated, needs a brand created, etc.

The issue is that we can dramatically improve import performance by 
allowing multiple Imports to occur in parallel, since they are all 
constrained by database access times. However, if we let too many go at 
once, Node consumes too much memory and slows to a crawl.

Our pre-streams Import implementation set an arbitrary importBatchSize, and 
completed the import of that many as a batch, before beginning the next 
batch. We're now trying something similar using async.eachLimit 
<https://github.com/caolan/async#eachLimit> in the writeStream. It's 
inefficient that we're fully writing out each batch before beginning on the 
next batch. Ultimately, using async.queue might be a better solution 
because it will keep the pipeline full of importBatchSize product families 
at all times. However, it will be trickier to implement because we'll need 
to listen for a saturated callback to know when it can't accept more 
product families, and then propagate that back pressure upstream.

The question is ultimately whether there's a better way to do this. We're 
trying to stay below 512 or 1024 MB of RAM so we can work on Heroku. Does 
it make sense to just test until we find an importBatchSize that fits our 
preferred RAM criteria? How crazy is it to look at process.memoryUsage() 
and dynamically scale our concurrency based on current memory usage? Is it 
OK to mix streams and async or is there a better abstraction that would 
automatically configure the concurrency?

Thanks in advance for your suggestions and/or pointers to code that has 
dealt with similar issues.
--
Dan Kohn
COO, Shopbeam <www.shopbeam.com>

-- 
-- 
Job Board: http://jobs.nodejs.org/
Posting guidelines: 
https://github.com/joyent/node/wiki/Mailing-List-Posting-Guidelines
You received this message because you are subscribed to the Google
Groups "nodejs" group.
To post to this group, send email to [email protected]
To unsubscribe from this group, send email to
[email protected]
For more options, visit this group at
http://groups.google.com/group/nodejs?hl=en?hl=en

--- 
You received this message because you are subscribed to the Google Groups 
"nodejs" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to [email protected].
For more options, visit https://groups.google.com/groups/opt_out.

Reply via email to