Hello, I have a data processing logic implemented so that on input it receives Iterable<Some>. I.e. pretty much the same as reducer's API. But I need to use this code in Map, where each element is "arrived" as map() method invocation. To solve the problem (at least for now), I'm doing the following: * run processing code in a thread which I start in setup() and wait for completion for it in cleanup() * keep a buffer which I fill with map input items (and feed Iterable object from this buffer until it has something) * write to buffer until it is full and only then switch to a thread which does processing. (assumption: processing logic always read data from buffer till the end, if processing fails, then the whole job is marked as failed).
I don't see that it should cause any noticeable performance degradation: switches between threads are quite rare. Also it looks like the approach is safe. Could anyone please confirm that? Or in case there's a better solution, please, let me know. Btw, the rough cut of implementation you can find here (small class): https://github.com/sematext/HBaseHUT/blob/master/src/main/java/com/sematext/hbase/hut/UpdatesProcessingMrJob.java. It is in working (unit-tests work well at least) state. Thank you in advance! Alex Baranau ---- Sematext :: http://sematext.com/ :: Solr - Lucene - Nutch - Hadoop - HBase
