> On 9 Dec 2013, at 08:10, Radim Vansa <rva...@redhat.com> wrote: > > There is one thing I really don't like about the current implementation: > DefaultCollector. And any other collection that keeps one (or more) > object per entry. > We can't assume that if you double the number of objects in memory (and > in fact, if you map entry to bigger object, you do that), they'd still > fit into it. Moreover, if you map the objects from cache store as well. > I believe we have to use Collector implemented as bounded queue, and > start reduction phase on the entries that have been mapped in parallel > to the mapper phase. Otherwise, say hello to OOME.
Agreed that's indeed a problem. Not sure it's related to parallel iteration though :-) > > Cheers > > Radim > > PS: And don't keep all the futures just to check that all tasks have > been finished - use ExecutorAllCompletionService instead. > >> On 12/06/2013 05:18 PM, Mircea Markus wrote: >> Thanks Vladimir, I like the hands on approach! >> Adding -dev, there's a lot of interest around the parallel M/R so I think >> others will have some thoughts on it as well. >> >> So what you're basically doing in your branch is iterate over all the keys >> in the cache and then for each key invoke the mapping in a separate thread. >> Whilst this would work, I think it has some drawbacks: >> - the iteration over the keys in the container happens in sequence, albeit >> the mapping phases happening in parallel. This speeds things up a bit but >> not as much as having the iteration >> happening in parallel, especially when the mapper is fast, which I think >> it's pretty common. >> - the StatelessTask + some smaller objects are being created for each >> iterated key. That's a lot of noise for the GC imo >> >> I think delegating the parallel iteration to the DataContainer (similar to >> AdvancedCacheLoader.process (Executor)) would be a better approach IMO: >> - the logic is reusable for other components as well, such as querying (to >> implement full-scan-like search, or a general purpose parallel iterator over >> the keys >> - object creation is reduced >> - the DefaultDetaContainer uses an EquivalentConcurrentHashMapV8 for holding >> the entries, which already supports parallel iteration so the heavy lifting >> is already in place >> >>> On Dec 4, 2013, at 5:16 PM, Vladimir Blagojevic <vblag...@redhat.com> wrote: >>> >>> Here is my M/R parallel execution solution updated to master >>> https://github.com/vblagoje/infinispan/tree/t_2284_new >>> >>> Now, I'll work on your solution which I am starting to like actually the >>> more I think about it. Although I have to admit that I would eviscerate >>> some of your interfaces like these KeyFilters into more prominent packages >>> so we can all use the same interfaces. Also I would see if we can >>> genericize some of your interfaces and implementations. >>> >>> Will keep you updated. >>> >>> Vladimir >> Cheers, > > > -- > Radim Vansa <rva...@redhat.com> > JBoss DataGrid QA > > _______________________________________________ > infinispan-dev mailing list > infinispan-dev@lists.jboss.org > https://lists.jboss.org/mailman/listinfo/infinispan-dev _______________________________________________ infinispan-dev mailing list infinispan-dev@lists.jboss.org https://lists.jboss.org/mailman/listinfo/infinispan-dev