About Zebra

2010-05-21 Thread Renato Marroquín Mogrovejo
Hi out-there! Is there any other documentation like papers or articles about Zebra and / or its use? Thanks in advance. Renato M.

About PIG and PNUTS

2010-05-28 Thread Renato Marroquín Mogrovejo
Hi PIG users! I was reading about PIG and PNUTS and started wondering how this two are related. I mean are there any application where these technologies are used together? Or any project on how integrate them? Thanks. Renato M.

Help with a tricky query

2010-06-09 Thread Renato Marroquín Mogrovejo
Hi everyone, today I came across with a particular query that I don't know how to model in PIG. Part of my data looks like this: Id1 Id2 Sc Va P1 P2 - - - - - 770011 990201 401 1e-125 100 65 990201 770011 440 1e-125 100 42 770011 770083 524 1e-120 89 12 770083

Re: Help with a tricky query

2010-06-10 Thread Renato Marroquín Mogrovejo
, *; J = join P1 by (id1, id2), P2 by (id1,id2); and now J contains pairs of rows from original table where id1 and id2 are reversed. is this what you want? On Wed, Jun 9, 2010 at 6:54 PM, Renato Marroquín Mogrovejo renatoj.marroq...@gmail.com wrote: Hi everyone, today

Re: Help with a tricky query

2010-06-12 Thread Renato Marroquín Mogrovejo
your data before I can offer too much advice. On Thu, Jun 10, 2010 at 5:38 PM, Renato Marroquín Mogrovejo renatoj.marroq...@gmail.com wrote: Hi everybody, thanks a lot for your responses. I am actually not looking for a transitive closure, I am not trying to infer

Re: Help with a tricky query

2010-06-22 Thread Renato Marroquín Mogrovejo
-a) In that case, you could sort the contents of the triples and group on the result, saving only those results that have 1 entry in the group. This would be faster as you would need to shuffle only a single copy of the data. -D On Sat, Jun 12, 2010 at 10:39 PM, Renato Marroquín Mogrovejo

Re: verifying that pig is talking to hadoop cluster

2010-07-01 Thread Renato Marroquín Mogrovejo
Viner On Wed, Jun 30, 2010 at 11:29 PM, Renato Marroquín Mogrovejo renatoj.marroq...@gmail.com wrote: Hi Dave, The same happened to me because even though we are not supposed to set the env variables for PIG, it needs them. So go to your sh file and edite

Sorting a tuple's content

2010-07-21 Thread Renato Marroquín Mogrovejo
Hey everybody, Does any body know how I can sort a tuple's content? For example, I have (770001,880001,990001,770001) and I would like to obtain (770001,770001,880001,990001). I tried doing a group by the first field but the thing is that I still get the whole tuple as a resultant bag. Thanks in

Re: Sorting a tuple's content

2010-07-22 Thread Renato Marroquín Mogrovejo
Thanks there Dmitriy. I will write my own then. Renato M. 2010/7/21 Dmitriy Ryaboy dvrya...@gmail.com that has to be a UDF, there is nothing built in for this. On Wed, Jul 21, 2010 at 6:33 PM, Renato Marroquín Mogrovejo renatoj.marroq...@gmail.com wrote: Hey everybody, Does any body

Re: Sorting a tuple's content

2010-07-25 Thread Renato Marroquín Mogrovejo
things. On 7/22/10 11:56 PM, Renato Marroquín Mogrovejo renatoj.marroq...@gmail.com wrote: Thanks there Dmitriy. I will write my own then. Renato M. 2010/7/21 Dmitriy Ryaboy dvrya...@gmail.com that has to be a UDF, there is nothing built in for this. On Wed, Jul 21, 2010

Re: COUNT(A.field1)

2010-08-28 Thread Renato Marroquín Mogrovejo
Hi, this is also interesting and kinda confusing for me too (= From the db world, the second one would have a better performance, but Pig doesn't save statistics on the data, so it has to read the whole file anyways, and like the count operation is mainly done on the map side, all attributes will

Re: COUNT(A.field1)

2010-09-02 Thread Renato Marroquín Mogrovejo
here (COUNT is a udf) : so the entire tuple is deserialized from input. Ofcourse, the performance difference, as Dmitriy noted, would not be very high. Regards, Mridul On Sunday 29 August 2010 01:14 AM, Renato Marroquín Mogrovejo wrote: Hi, this is also interesting and kinda confusing

Re: COUNT(A.field1)

2010-09-03 Thread Renato Marroquín Mogrovejo
at 2:51 PM, Renato Marroquín Mogrovejo renatoj.marroq...@gmail.com wrote: So in terms of performance is the same if I count just a single column or the whole data set, right? But what Thejas said about the loader having optimizations (selective deserialization or columnar storage) is something

Re: PIG Newbie question: error using substring

2010-09-21 Thread Renato Marroquín Mogrovejo
Hi Ravi, you have to register the piggybank jar at the beggining of your pig script. {code} REGISTER /workspace/pig-test/contrib/piggybank/java/piggybank.jar; ... {/code} If you don't have it, you will have to build it from source using the ant command, and then import it. Renato M.