Hi out-there! Is there any other documentation like papers or articles about
Zebra and / or its use?
Thanks in advance.
Renato M.
Hi PIG users!
I was reading about PIG and PNUTS and started wondering how this two are
related. I mean are there any application where these technologies are used
together? Or any project on how integrate them? Thanks.
Renato M.
Hi everyone, today I came across with a particular query that I don't know
how to model in PIG. Part of my data looks like this:
Id1 Id2 Sc Va P1 P2
- - - - -
770011 990201 401 1e-125 100 65
990201 770011 440 1e-125 100 42
770011 770083 524 1e-120 89 12
770083
, *;
J = join P1 by (id1, id2), P2 by (id1,id2);
and now J contains pairs of rows from original table where id1 and id2
are
reversed.
is this what you want?
On Wed, Jun 9, 2010 at 6:54 PM, Renato Marroquín Mogrovejo
renatoj.marroq...@gmail.com wrote:
Hi everyone, today
your data before I can offer too
much advice.
On Thu, Jun 10, 2010 at 5:38 PM, Renato Marroquín Mogrovejo
renatoj.marroq...@gmail.com wrote:
Hi everybody, thanks a lot for your responses.
I am actually not looking for a transitive closure, I am not trying
to
infer
-a)
In that case, you could sort the contents of the triples and group on the
result, saving only those results that have 1 entry in the group. This
would be faster as you would need to shuffle only a single copy of the
data.
-D
On Sat, Jun 12, 2010 at 10:39 PM, Renato Marroquín Mogrovejo
Viner
On Wed, Jun 30, 2010 at 11:29 PM, Renato Marroquín Mogrovejo
renatoj.marroq...@gmail.com wrote:
Hi Dave,
The same happened to me because even though we are not supposed to set
the
env variables for PIG, it needs them. So go to your sh file and edite
Hey everybody, Does any body know how I can sort a tuple's content?
For example, I have (770001,880001,990001,770001) and I would like to obtain
(770001,770001,880001,990001). I tried doing a group by the first field but
the thing is that I still get the whole tuple as a resultant bag.
Thanks in
Thanks there Dmitriy. I will write my own then.
Renato M.
2010/7/21 Dmitriy Ryaboy dvrya...@gmail.com
that has to be a UDF, there is nothing built in for this.
On Wed, Jul 21, 2010 at 6:33 PM, Renato Marroquín Mogrovejo
renatoj.marroq...@gmail.com wrote:
Hey everybody, Does any body
things.
On 7/22/10 11:56 PM, Renato Marroquín Mogrovejo
renatoj.marroq...@gmail.com wrote:
Thanks there Dmitriy. I will write my own then.
Renato M.
2010/7/21 Dmitriy Ryaboy dvrya...@gmail.com
that has to be a UDF, there is nothing built in for this.
On Wed, Jul 21, 2010
Hi, this is also interesting and kinda confusing for me too (=
From the db world, the second one would have a better performance, but Pig
doesn't save statistics on the data, so it has to read the whole file
anyways, and like the count operation is mainly done on the map side, all
attributes will
here
(COUNT is a udf) : so the entire tuple is deserialized from input.
Ofcourse, the performance difference, as Dmitriy noted, would not be very
high.
Regards,
Mridul
On Sunday 29 August 2010 01:14 AM, Renato Marroquín Mogrovejo wrote:
Hi, this is also interesting and kinda confusing
at 2:51 PM, Renato Marroquín Mogrovejo
renatoj.marroq...@gmail.com wrote:
So in terms of performance is the same if I count just a single column or
the whole data set, right?
But what Thejas said about the loader having optimizations (selective
deserialization or columnar storage) is something
Hi Ravi,
you have to register the piggybank jar at the beggining of your pig script.
{code}
REGISTER /workspace/pig-test/contrib/piggybank/java/piggybank.jar;
...
{/code}
If you don't have it, you will have to build it from source using the ant
command, and then import it.
Renato M.
14 matches
Mail list logo