Re: How to simplify our development flow under the means of using Hive?

Min Zhou Sun, 22 Feb 2009 20:00:19 -0800

Hi Prasad ,

This is just streaming,  a sort of tech how to complement the ability of
Hive sql.  Sometimes this trick is also useless. For example,  if I want to
do jobs like SecondarySort,  can that way be okay?
My major intention is that I want to know how to schedule those two things,
hive and raw mapreduce.


On Mon, Feb 23, 2009 at 11:47 AM, Prasad Chakka <[email protected]> wrote:

>  You can use custom mapper and reducer scripts using TRANSFORM/MAP/REDUCE
> facilities. Check the wiki on how to use them. Or do you want something
> different?
>
>
>
>
> ------------------------------
> *From: *Min Zhou <[email protected]>
> *Reply-To: *<[email protected]>
> *Date: *Sun, 22 Feb 2009 19:42:50 -0800
> *To: *<[email protected]>
> *Subject: *How to simplify our development flow under the means of using
> Hive?
>
>
> Hi list,
>
>     I'm goint to take Hive into production to analyze our web logs, which
> are hundreds of  giga-bytes per day. Previously, we did this job by using
> Apache hadoop, running our raw mapreduce code. It did work, but it also
> decreased our productivity directly. We were suffering from writting code
> with similar logic again and again. It could be worse, when the format of
> our logs being changed. For example, when we want to insert one more field
> in each line of the log, the previous work would be useless, then we have to
> redo it. Hence we are thinking about using Hive as a persistent layer, to
> store and retrieve the schemes of the data easily. But we found that
> sometimes Hive could not do some sort of complex analysis, because of the
> limitation of the ideographic ability of SQL.   We have to write our own
> UDFs, even though, some difficulties Hive still cannot go through.  Thus we
> also need to write raw mapreduces code,  which let us come up against
> another problem.  Since one is a set of SQL scripts, the other is pieces of
> java or hybrid code, How to coordinate  Hive and raw mapreduce code and how
> to shedule them? How does Facebook use Hive? And what is your solution when
> you come across the similar problems?
>
>     In the end, we are considering about using Hive as our data warehouse.
> Any suggestions?
>
> Thanks in advance!
> Min
>
> --
> My research interests are distributed systems, parallel computing and
> bytecode based virtual machine.
>
> http://coderplay.javaeye.com
>
>

Regards,
Min
-- 
My research interests are distributed systems, parallel computing and
bytecode based virtual machine.

http://coderplay.javaeye.com

Re: How to simplify our development flow under the means of using Hive?

Reply via email to