Hmmm....ok, so I'm on the right track with respect to how to do the
aggregation. I figured a tab might screw things up - will try.
Thanks,
C G
Ted Dunning <[EMAIL PROTECTED]> wrote:
Just use a tab instead of _.
Makes everything look uniform.
On 9/19/07 2:59 PM, "C G"
wrote:
> Hi All:
>
> Please indulge an embarrassing question for which I am sure there is a
> simple answer.
>
> Consider an aggregator which takes input like:
>
> A A 2
> A B 5
> A C 10
> A A 4
> A B 9
> A D 5
>
> and returns
> A A 6
> A B 14
> A C 10
> A D 5
> A 35
>
> I've been using the aggregation classes to do the above very easily. My
> question goes more to what are best practices in representing the output. The
> solution I've used is to create new keys by concat-ing the various values,
> like this:
>
> public class distinctRowAggregatorDescriptor extends
> ValueAggregatorBaseDescriptor
> {
> public ArrayList generateKeyValPairs(Object key, Object val) {
> String input [] = val.toString().split("\t");
> ArrayList retv = new ArrayList();
> .
> .
> retv.add(generateEntry(LONG_VALUE_SUM, input[0]+"_"+input[1],
> (int)input[2]));
> return retv;
> }
> }
>
> This works just fine, and creates output that looks like:
>
> A_B 14
>
> and then I have to make a pass across the output data to split the "keys" up
> to make something suitable for loading into a database. In my little
> prototype the split apart pass is just a script.
>
> The approach above, which does produce correct results, seems inherently
> misguided/broken with respect to getting the final output formatted correctly.
>
> Can somebody buy me a vowel and show me how to go straight to a multiple
> column output format and avoid the embarrasing non-parallel split to produce
> my load files?
>
> Thanks,
> C G
>
>
>
> ---------------------------------
> Be a better Globetrotter. Get better travel answers from someone who knows.
> Yahoo! Answers - Check it out.
---------------------------------
Got a little couch potato?
Check out fun summer activities for kids.