Yes, we are leaning on the map-side join package quite heavily too - it is
an excellent addition to the MapReduce model that's proving really useful.
However, while HADOOP-5571 is an immediate problem for us, I can imagine
that we will probably be wanting to join over 64 files soon as well,
especially if we move onto larger clusters.

2009/3/25 jason hadoop <[email protected]>

> That code is highly optimized and quite difficult to follow. We have always
> limited our joins to 31 members and ignored the problem.
> But I think your jira and fixing it are the correct choices.
>
> There is, in my opinion, a decent write up on how to use map side joins in
> chapter 8 of my book, so I suspect more people will use this soon, as map
> side join is an incredibly powerful tool.
>
> In one of our production applications it took the run time from 5+ hours to
> about 12 minutes.
>
> On Wed, Mar 25, 2009 at 7:23 AM, Jingkei Ly <[email protected]> wrote:
>
> > Am I right in thinking that the CompositeInputFormat is limited to
> joining
> > 64 files?
> >
> > I believe this comes about because TupleWritable uses a single long-type
> > instance field in order to maintain a bitset of tuple slots that have
> been
> > written to - I'm guessing this is for performance reasons, but it also
> > implies that the TupleWritable only has 64-bits to play with when
> joining.
> >
> > If my assumptions above are true, could replacing this long with a
> > java.util.BitSet be appropiate in terms of making the map-side join
> package
> > more scalable?
> >
>
>
>
> --
> Alpha Chapters of my book on Hadoop are available
> http://www.apress.com/book/view/9781430219422
>

Reply via email to