That code is highly optimized and quite difficult to follow. We have always
limited our joins to 31 members and ignored the problem.
But I think your jira and fixing it are the correct choices.

There is, in my opinion, a decent write up on how to use map side joins in
chapter 8 of my book, so I suspect more people will use this soon, as map
side join is an incredibly powerful tool.

In one of our production applications it took the run time from 5+ hours to
about 12 minutes.

On Wed, Mar 25, 2009 at 7:23 AM, Jingkei Ly <[email protected]> wrote:

> Am I right in thinking that the CompositeInputFormat is limited to joining
> 64 files?
>
> I believe this comes about because TupleWritable uses a single long-type
> instance field in order to maintain a bitset of tuple slots that have been
> written to - I'm guessing this is for performance reasons, but it also
> implies that the TupleWritable only has 64-bits to play with when joining.
>
> If my assumptions above are true, could replacing this long with a
> java.util.BitSet be appropiate in terms of making the map-side join package
> more scalable?
>



-- 
Alpha Chapters of my book on Hadoop are available
http://www.apress.com/book/view/9781430219422

Reply via email to