Pig folks: it seems like it defies the expectation if TOBAG is run on a single TUPLE and you don't get a bag. I can patch it, but seem like a fair change?
2012/4/4 Eli Finkelshteyn <iefin...@gmail.com> > Nah, doesn't work because it doubles up the tuple, so that: > > TOBAG(('hello', 'howdy', 'hi')) > returns > {(('hello', 'howdy', 'hi'))} > > And so, > > FLATTEN(TOBAG(t1)), FLATTEN(TOBAG(t2)) > gets me > > ('hello', 'howdy', 'hi'), ('hola', 'bonjour') > > which is just what I started with. > > Anyway, to solve this problem, what I did was make a quick python udf to > make a bag from a tuple without doubling up the tuple, and then ran FLATTEN > on that, which looks like: > > bagged = FOREACH split_set GENERATE FLATTEN(py_udfs.tupleToBag(t1)**), > FLATTEN(py_udfs.tupleToBag(t2)**); > > Where the Python udf I'm using is: > > @outputSchema("b:bag{}") > def tupleToBag(tup): > b = [tupify(i) for i in tupify(tup)] > return b > > def tupify(tup): > if isinstance(tup, tuple): > return tup > return (tup,) > > I'll add that into Python PiggyBank as soon as I get a chance to finish > that stuff up. > > Eli > > > > On 4/4/12 2:43 PM, Jonathan Coveney wrote: > >> FLATTEN(TOBAG(t1)), FLATTEN(TOBAG(t2)) should give you the cross >> >> 2012/4/4 Eli Finkelshteyn<iefinkel@gmail.**com <iefin...@gmail.com>> >> >> That's for a relation only. Unless I'm missing something, it does not >>> work >>> for tuples. What I'm doing what require a FOREACH, I'm thinking. >>> >>> Eli >>> >>> >>> On 4/4/12 2:24 PM, Prashant Kommireddi wrote: >>> >>> >>> http://pig.apache.org/docs/r0.****9.1/basic.html#cross<http://pig.apache.org/docs/r0.**9.1/basic.html#cross> >>>> <http://**pig.apache.org/docs/r0.9.1/**basic.html#cross<http://pig.apache.org/docs/r0.9.1/basic.html#cross> >>>> > >>>> >>>> -Prashant >>>> >>>> On Wed, Apr 4, 2012 at 11:18 AM, Eli Finkelshteyn<iefinkel@gmail.**** >>>> com<iefin...@gmail.com> >>>> >>>> wrote: >>>>> >>>> Hi Folks, >>>> >>>>> I'm currently trying to do something I figured would be trivial, but >>>>> actually wound up being a bit of work for me, so I'm wondering if I'm >>>>> missing something. All I want to do is get a cross product of two >>>>> tuples. >>>>> So for example, given an input of: >>>>> >>>>> ('hello', 'howdy', 'hi'), ('hola', 'bonjour') >>>>> >>>>> I'd get: >>>>> >>>>> ('hello', 'hola') >>>>> ('hello', 'bonjour') >>>>> ('howdy', 'hola') >>>>> ('howdy', 'bonjour') >>>>> ('hi', 'hola') >>>>> ('hi', 'bonjour') >>>>> >>>>> At first, I figured I could FLATTEN(TOBAG(tuple1, tuple2)), but that's >>>>> no >>>>> good cause the tuples are first themselves put into new tuples. So, >>>>> what >>>>> I'm left with no is writing a dirty and slow python udf for this. Is >>>>> there >>>>> really no better way to do this? I'd think it would be a pretty >>>>> standard >>>>> task. >>>>> >>>>> Eli >>>>> >>>>> >>>>> >