Hi Prasad , This is just streaming, a sort of tech how to complement the ability of Hive sql. Sometimes this trick is also useless. For example, if I want to do jobs like SecondarySort, can that way be okay? My major intention is that I want to know how to schedule those two things, hive and raw mapreduce.
On Mon, Feb 23, 2009 at 11:47 AM, Prasad Chakka <[email protected]> wrote: > You can use custom mapper and reducer scripts using TRANSFORM/MAP/REDUCE > facilities. Check the wiki on how to use them. Or do you want something > different? > > > > > ------------------------------ > *From: *Min Zhou <[email protected]> > *Reply-To: *<[email protected]> > *Date: *Sun, 22 Feb 2009 19:42:50 -0800 > *To: *<[email protected]> > *Subject: *How to simplify our development flow under the means of using > Hive? > > > Hi list, > > I'm goint to take Hive into production to analyze our web logs, which > are hundreds of giga-bytes per day. Previously, we did this job by using > Apache hadoop, running our raw mapreduce code. It did work, but it also > decreased our productivity directly. We were suffering from writting code > with similar logic again and again. It could be worse, when the format of > our logs being changed. For example, when we want to insert one more field > in each line of the log, the previous work would be useless, then we have to > redo it. Hence we are thinking about using Hive as a persistent layer, to > store and retrieve the schemes of the data easily. But we found that > sometimes Hive could not do some sort of complex analysis, because of the > limitation of the ideographic ability of SQL. We have to write our own > UDFs, even though, some difficulties Hive still cannot go through. Thus we > also need to write raw mapreduces code, which let us come up against > another problem. Since one is a set of SQL scripts, the other is pieces of > java or hybrid code, How to coordinate Hive and raw mapreduce code and how > to shedule them? How does Facebook use Hive? And what is your solution when > you come across the similar problems? > > In the end, we are considering about using Hive as our data warehouse. > Any suggestions? > > Thanks in advance! > Min > > -- > My research interests are distributed systems, parallel computing and > bytecode based virtual machine. > > http://coderplay.javaeye.com > > Regards, Min -- My research interests are distributed systems, parallel computing and bytecode based virtual machine. http://coderplay.javaeye.com
