We've written a writeStream that is updating product information in a Postgres database, and would like some advice on managing how many database writes we can handle concurrently.
The complete stream reads a CSV file which pipes to a transform stream converting multiple rows into a product family (one product with many sizes and colors), and then pipes to the writeStream which writes to the database. The writeStream calls an Import function that does multiple reads and writes to determine whether the product family exists, needs to be updated, needs a brand created, etc. The issue is that we can dramatically improve import performance by allowing multiple Imports to occur in parallel, since they are all constrained by database access times. However, if we let too many go at once, Node consumes too much memory and slows to a crawl. Our pre-streams Import implementation set an arbitrary importBatchSize, and completed the import of that many as a batch, before beginning the next batch. We're now trying something similar using async.eachLimit <https://github.com/caolan/async#eachLimit> in the writeStream. It's inefficient that we're fully writing out each batch before beginning on the next batch. Ultimately, using async.queue might be a better solution because it will keep the pipeline full of importBatchSize product families at all times. However, it will be trickier to implement because we'll need to listen for a saturated callback to know when it can't accept more product families, and then propagate that back pressure upstream. The question is ultimately whether there's a better way to do this. We're trying to stay below 512 or 1024 MB of RAM so we can work on Heroku. Does it make sense to just test until we find an importBatchSize that fits our preferred RAM criteria? How crazy is it to look at process.memoryUsage() and dynamically scale our concurrency based on current memory usage? Is it OK to mix streams and async or is there a better abstraction that would automatically configure the concurrency? Thanks in advance for your suggestions and/or pointers to code that has dealt with similar issues. -- Dan Kohn COO, Shopbeam <www.shopbeam.com> -- -- Job Board: http://jobs.nodejs.org/ Posting guidelines: https://github.com/joyent/node/wiki/Mailing-List-Posting-Guidelines You received this message because you are subscribed to the Google Groups "nodejs" group. To post to this group, send email to [email protected] To unsubscribe from this group, send email to [email protected] For more options, visit this group at http://groups.google.com/group/nodejs?hl=en?hl=en --- You received this message because you are subscribed to the Google Groups "nodejs" group. To unsubscribe from this group and stop receiving emails from it, send an email to [email protected]. For more options, visit https://groups.google.com/groups/opt_out.
