Hello,
         I am trying to filter tuples in bag which is generated by sequence
of operation in pig. My data looks like this.
        (0,{(0,8),(0,1),(0,6),(0,7),(0,4)})
        (1,{(1,6),(1,7),(1,8),(1,4)})
        (4,{(4,6),(4,8),(4,7)})
        (6,{(6,8),(6,7)})
        (7,{(7,8)})
        This relation is stored in R4. When I do a describe on this relation
it says like this.
        R4: {group: int,R3: {R::b: int,R1::b1: int}}

        I was trying to filter the data in the inner bag so that the one
which had smallest difference stays and rest all are filtered out. For ex
the desired output would be
        (0,{(0,1)})
        (1,{(1,4)})
        (4,{(4,6)})
        (6,{(6,7)})
        (7,{(7,8)})

        I tried doing it  like this:
        R5 = foreach R4 {
                     R6 = filter R3 by MIN(b1-b);
                     generate group;
                }
       and also some other methods but then realized this was not the proper
way of doing it and I was stuck. Then I thought I might write a UDF to
achieve it but it would be great if I could do it in Pig it self. Can anyone
help me out with this?

Thanks,
Dhaval Deshpande.

Reply via email to