NamedVector is already supported in VectorWritable, do we need a new Writable?
Ah, well if VectorWritable can already support this, then there
definitely isn't any need for another writable. I took a look at VW
awhile back and didn't see anything that could help; is there some sort
of a label I could use?
Is the issue that you are doing joins? Without CompositeInputFormat it's still
possible, and we use the pattern elsewhere. You need some cleverness with a
custom key and partitioner that will send key x from source A and key x from
source B to the same reducer while maintaining inside a bit that indicates
whether it's from A or B.
Yes, the issue is joins. I'm effectively trying to replace this one line
of code:
conf.set("mapred.join.expr", CompositeInputFormat.compose(
"inner", SequenceFileInputFormat.class, aPath, bPath));
If this can be done without CompositeInputFormat, or the partitioner can
be modified to definitively assign specific/custom keys and values to
specific nodes, then that would be perfect. Should I look into Hadoop's
Partitioner/MapPartitioner/MapTask classes for this, or is there
somewhere else I should look?
Thanks for the feedback!
Shannon