Hello everyone, I started using the MapReduce implementation of Infinispan and I came across some possible limitations. Thus, I want to make some suggestions about the MapReduce (MR) implementation of Infinispan. Depending on the algorithm, there might be some memory problems, especially for intermediate results. An example of such a case is group by. Suppose that we have a cluster of 2 nodes with 2 GB available. Let a distributed cache, where simple car objects (id,brand,colour) are stored and the total size of data is 3.5GB. If all objects have the same colour , then all 3.5 GB would go to only one reducer, as a result an OutOfMemoryException will be thrown.
To overcome these limitations, I propose to add as parameter the name of the intermediate cache to be used. This will enable the creation of a custom configured cache that deals with the memory limitations. Another feature that I would like to have is to set the name of the output cache. The reasoning behind this is similar to the one mentioned above. I wait for your thoughts on these two suggestions. Regards, Evangelos _______________________________________________ infinispan-dev mailing list [email protected] https://lists.jboss.org/mailman/listinfo/infinispan-dev
