I think you'll have to push that for 1.0 for now then; 0.20.x doesn't have map-side joins. Yes that is a blocker for what you're trying to do and what Sebastian is trying to do for recommendations. I've already reimplemented recommenders separately with these things and it simplifies and speeds up the pipeline.
I'd be more against sticking to 0.20.x except that there's already evidently some issue even getting *on* to 0.20.x in the code, which is more important to address. And the jump to 0.21.x is a moderate increase in functionality. To take advantage of it still requires rewriting everything. Maybe we should wait for an even bigger leap forward to rewrite everything. Here's a summary of my recipe for dealing with this in 0.20.x. First, while you can't have multiple mappers, you can have multiple input paths. So, you can join two different inputs keyed by the same keys without trouble, typically with an identity Mapper. Of course, they have to have the same value class. This is a problem if you want to join Xs and Ys keyed by the same key. One solution is to create an "XOrYWritable" which holds either an X or a Y. Then the jobs that output an X or a Y both output one same value type, XOrYWritable. See VectorOrPrefWritable for instance. The Reducer can then check each value to pick out an X or a Y and get both. In some cases you have to know the ordering, whether you'll get an X or Y first. In this case you need some cleverness with the key. Instead of a VarLongWritable for a key, you need something like "EntityJoinKey" which contains a long value (the ID) but also a boolean or integer that indicates an ordering. Maybe it adds a boolean called "before". It needs to implement WritableComparable and order by the ID value, but then by the before/after flag. It also needs to specify a Partitioner which maps keys to the same reducer if they have the same ID, regardless of before/after flag. This is fairly convenient because you have a clearer picture of which values are coming in on "before" keys and then which are coming after. It's definitely more complex, but it's doable. On Sun, May 22, 2011 at 8:20 PM, Shannon Quinn <[email protected]> wrote: > What did you have in mind, then, for making matrix multiplication work > without map-side joins (or at least, in the simple format available in > 0.18)?
