There is one thing I really don't like about the current implementation: 
DefaultCollector. And any other collection that keeps one (or more) 
object per entry.
We can't assume that if you double the number of objects in memory (and 
in fact, if you map entry to bigger object, you do that), they'd still 
fit into it. Moreover, if you map the objects from cache store as well.
I believe we have to use Collector implemented as bounded queue, and 
start reduction phase on the entries that have been mapped in parallel 
to the mapper phase. Otherwise, say hello to OOME.

Cheers

Radim

PS: And don't keep all the futures just to check that all tasks have 
been finished - use ExecutorAllCompletionService instead.

On 12/06/2013 05:18 PM, Mircea Markus wrote:
> Thanks Vladimir, I like the hands on approach!
> Adding -dev, there's a lot of interest around the parallel M/R so I think 
> others will have some thoughts on it as well.
>
> So what you're basically doing in your branch is iterate over all the keys in 
> the cache and then for each key invoke the mapping in a separate thread. 
> Whilst this would work, I think it has some drawbacks:
> - the iteration over the keys in the container happens in sequence, albeit 
> the mapping phases happening in parallel. This speeds things up a bit but not 
> as much as having the iteration
> happening in parallel, especially when the mapper is fast, which I think it's 
> pretty common.
> - the StatelessTask + some smaller objects are being created for each 
> iterated key. That's a lot of noise for the GC imo
>
> I think delegating the parallel iteration to the DataContainer (similar to 
> AdvancedCacheLoader.process (Executor)) would be a better approach IMO:
> - the logic is reusable for other components as well, such as querying (to 
> implement full-scan-like search, or a general purpose parallel iterator over 
> the keys
> - object creation is reduced
> - the DefaultDetaContainer uses an EquivalentConcurrentHashMapV8 for holding 
> the entries, which already supports parallel iteration so the heavy lifting 
> is already in place
>
> On Dec 4, 2013, at 5:16 PM, Vladimir Blagojevic <vblag...@redhat.com> wrote:
>
>> Here is my M/R parallel execution solution updated to master 
>> https://github.com/vblagoje/infinispan/tree/t_2284_new
>>
>> Now, I'll work on your solution which I am starting to like actually the 
>> more I think about it. Although I have to admit that I would eviscerate some 
>> of your interfaces like these KeyFilters into more prominent packages so we 
>> can all use the same interfaces. Also I would see if we can genericize some 
>> of your interfaces and implementations.
>>
>> Will keep you updated.
>>
>> Vladimir
> Cheers,


-- 
Radim Vansa <rva...@redhat.com>
JBoss DataGrid QA

_______________________________________________
infinispan-dev mailing list
infinispan-dev@lists.jboss.org
https://lists.jboss.org/mailman/listinfo/infinispan-dev

Reply via email to