Hello all,

I need to do some calculations that has to merge two sets of very large data (basically calculate variance). One set contains a set of "means" and the second a set of objects tied to a mean.

Normally I would send the set of means using the distributed cache, but the set has become too large to keep in memory and it is going to grow in the future.

I would like to join the two data files so that each mapper gets the entries of both files with the same keys. I have seen there is a CompositeInputFormat but there is no real documentation on it.

Can anyone enlighten me on whether it would be useful and how it works.

Cheers,
Christian

Reply via email to