On Sun, Jan 2, 2011 at 8:48 AM, Shannon Quinn <[email protected]> wrote:
> Ah, well if VectorWritable can already support this, then there definitely
> isn't any need for another writable. I took a look at VW awhile back and
> didn't see anything that could help; is there some sort of a label I could
> use?
Yep have a look at VectorWritable.write() for example. It does handle
NamedVector.
> Yes, the issue is joins. I'm effectively trying to replace this one line of
> code:
>
> conf.set("mapred.join.expr", CompositeInputFormat.compose(
> "inner", SequenceFileInputFormat.class, aPath, bPath));
This may not be 100% what you are talking about, but this is my general recipe.
First you can specify the multiple input paths with
FileInputFormat.setInputPaths().
Say one path has (A,B) and the other has (A,C). You are trying to join
into (A,(B,C)).
What I do is create a "BOrCWritable" which either has a B or a C
inside. Then you need to have already output your input as (A,BOrC) in
both paths. This is the real messy part, but in practice has not been
terrible in the contexts I've needed it.
Then your mapper is an identity mapper and the reducer will receive B
and C for each A, each inside a BOrC.
If you need to control whether B or C comes first it gets tougher
since you need a custom wrapper key for A. But it's not terrible.