Re: Hive and Hadoop streaming

Min Zhou Mon, 25 May 2009 01:11:34 -0700

Hey Zheng,

I don't think hive support 'add jar' command right now, cauz code on this
issue hasnot been committed yet.
check it out at:
https://issues.apache.org/jira/browse/HIVE-338


On Mon, May 25, 2009 at 3:59 PM, Manhee Jo <[email protected]> wrote:

>  Thank you so much!!!
>
>
> ----- Original Message -----
> *From:* Zheng Shao <[email protected]>
> *To:* [email protected]
> *Sent:* Monday, May 25, 2009 4:33 PM
> *Subject:* Re: Hive and Hadoop streaming
>
> In this case, you just need to compile your .java into a jar file, and do
> *
> add jar* fullpath/to/myprogram.jar; SELECT TRANSFORM(col1, col2, col3,
> col4) USING "java -cp myprogram.jar WeekdayMapper" AS (outcol1, outcol2,
> outcol3, outcol4)"
>
> Let us know if it works out or not.
>
> Zheng
>
> On Sun, May 24, 2009 at 10:50 PM, Manhee Jo <[email protected]> wrote:
>
>>  Thank you Zheng,
>> Here is my WeekdayMapper.java, which is just a test that does almost same
>> thing as the "weekday_mapper.py" does.
>> As you see below, it does not take WritableComparable nor Writable class.
>> It receives the 4 columns just string
>> arguments. Any advice would be very appreciated.
>>
>> /**
>>  *  WeekdayMapper.java
>>  */
>>
>> import java.io.*;
>> import java.util.*;
>>
>> class WeekdayMapper {
>>    public static void main (String[] args) throws IOException {
>>    Scanner stdIn = new Scanner(System.in);
>>    String line=null;
>>    String[] column;
>>    long unixTime;
>>    Date d;
>>    GregorianCalendar cal1 = new GregorianCalendar();
>>
>>    while (stdIn.hasNext()) {
>>      line = stdIn.nextLine();
>>      column = line.split("\t");
>>      unixTime = Long.parseLong(column[3]);
>>      d = new Date(unixTime*1000);
>>      cal1.setTime(d);
>>      int dow = cal1.get(Calendar.DAY_OF_WEEK);
>>      System.out.println(column[0] + "\t" + column[1] + "\t"
>>           + column[2] + "\t" + dow);
>>     }
>>   }
>> }
>>
>> Thanks,
>> Manhee
>>
>>
>> ----- Original Message -----
>> *From:* Zheng Shao <[email protected]>
>> *To:* [email protected]
>> *Sent:* Monday, May 25, 2009 10:28 AM
>> *Subject:* Re: Hive and Hadoop streaming
>>
>> How do your java map function receive the 4 columns?
>> I assume your java map function takes a WritableComparable key and
>> Writable value.
>>
>> Zheng
>>
>> 2009/5/24 Manhee Jo <[email protected]>
>>
>>>  I have some mappers already coded in Java. So I want to use it
>>> as much as possible in Hive environment.
>>> Then, how can I call a Java mapper to "select transform" in Hive?
>>> For example, what is wrong with the query below and why?
>>>
>>> INSERT OVERWRITE TABLE u_data_new
>>> SELECT
>>>   TRANSFORM (userid, movieid, rating, unixtime)
>>> *  USING 'java WeekdayMapper'
>>> *  AS (userid, movieid, rating, weekday)
>>> FROM u_data;
>>> Thank you.
>>>
>>>
>>> Regards,
>>> Manhee
>>>
>>
>>
>>
>> --
>> Yours,
>> Zheng
>>
>>
>
>
> --
> Yours,
> Zheng
>
>


-- 
My research interests are distributed systems, parallel computing and
bytecode based virtual machine.

My profile:
http://www.linkedin.com/in/coderplay
My blog:
http://coderplay.javaeye.com

Re: Hive and Hadoop streaming

Reply via email to