This is way too generic task description to be node-specific. We don't know
how is the file processed. We don't know how is this CSV processing needs
to be atomic or not - it seems yes, but not clear. We don't know if you can
abort it or not. We don't know if you want to block the upload function
completely or you merely block it because you can't abort the previous job.
Such a thing needs to be designed with the task specifics in mind.
Here are a few scenarios, depending on your requirements.
Under assumption that the 100,000 rows in the CSV are individual "work
items", and the whole CSV of them is a "batch of work", you can, on
upload, simply create a queue of 100,000 things to process. Also, hold a
hashmap of these work items, so you can address them by the batch. Also
have a work dispatch protocol and a node Microservice that takes the rows
to processing, one by one, and take them off this batch queue into another
queue, "pending batch finish". Once it is all completed, mark those
"pending finish" as "completely completed" and clear the batch. Expose a
"Cancel batch work" functionality to user, if they click "cancel current
batch", you clean up all the pending tasks so that your worker microservice
stops processing these. Also, mark the batch as canceled, so you can clean
up the work items that were already completed.
If you can't break processing into items (maybe you're aggregating things
over those 100,000 rows), perhaps you can run this aggregation in an
interruptible loop. Then your "Cancel batch work" would first check if
there are running aggregations and interrupt/abort them, then proceed as
planned and run the new file.
Your least flexible option, e.g. you cannot stop the processing once it has
started, is to at least provide upload queue - the first file uploaded is
getting processed, and now you expose an endpoint where you can upload
additional CSVs. They are just sent to server and are waiting. You can
still cancel those "pending" CSVs and upload new ones instead, even if ytou
can't break the main, running CSV. Then expose a simple "status" endpoint
where you can indicate your status to the user, e.g. "processed 40,000 of
100,000 rows, 3 CSVs pending processing".
You would have to keep all those locks and things outside the running
process - Redis is probably the simplest to use - because you might be
running these tasks (upload, processing, statuses) on different servers, or
at the very least, different workers in your Node cluster instance.
But these are very rough guesses. With a lot more details, I could provide
a better overview. Shoot if you have other questions.
Job board: http://jobs.nodejs.org/
New group rules:
Old group rules:
You received this message because you are subscribed to the Google Groups
To unsubscribe from this group and stop receiving emails from it, send an email
To post to this group, send email to email@example.com.
To view this discussion on the web visit
For more options, visit https://groups.google.com/d/optout.