RE: Hive and Hadoop streaming

Ashish Thusoo Tue, 26 May 2009 14:54:49 -0700

Actually

add file

is the correct command.

Ashish

________________________________
From: Min Zhou [mailto:[email protected]]
Sent: Monday, May 25, 2009 1:11 AM
To: [email protected]
Subject: Re: Hive and Hadoop streaming

Hey Zheng,

I don't think hive support 'add jar' command right now, cauz code on this issue 
hasnot been committed yet.
check it out at:
https://issues.apache.org/jira/browse/HIVE-338

On Mon, May 25, 2009 at 3:59 PM, Manhee Jo 
<[email protected]<mailto:[email protected]>> wrote:
Thank you so much!!!

----- Original Message -----
From: Zheng Shao<mailto:[email protected]>
To: [email protected]<mailto:[email protected]>
Sent: Monday, May 25, 2009 4:33 PM
Subject: Re: Hive and Hadoop streaming

In this case, you just need to compile your .java into a jar file, and do

add jar fullpath/to/myprogram.jar; SELECT TRANSFORM(col1, col2, col3, col4) 
USING "java -cp myprogram.jar WeekdayMapper" AS (outcol1, outcol2, outcol3, 
outcol4)"

Let us know if it works out or not.

Zheng

On Sun, May 24, 2009 at 10:50 PM, Manhee Jo 
<[email protected]<mailto:[email protected]>> wrote:
Thank you Zheng,
Here is my WeekdayMapper.java, which is just a test that does almost same thing 
as the "weekday_mapper.py" does.
As you see below, it does not take WritableComparable nor Writable class. It 
receives the 4 columns just string
arguments. Any advice would be very appreciated.

/**
 *  WeekdayMapper.java
 */

import java.io.*;
import java.util.*;

class WeekdayMapper {
   public static void main (String[] args) throws IOException {
   Scanner stdIn = new Scanner(System.in);
   String line=null;
   String[] column;
   long unixTime;
   Date d;
   GregorianCalendar cal1 = new GregorianCalendar();

   while (stdIn.hasNext()) {
     line = stdIn.nextLine();
     column = line.split("\t");
     unixTime = Long.parseLong(column[3]);
     d = new Date(unixTime*1000);
     cal1.setTime(d);
     int dow = cal1.get(Calendar.DAY_OF_WEEK);
     System.out.println(column[0] + "\t" + column[1] + "\t"
          + column[2] + "\t" + dow);
    }
  }
}

Thanks,
Manhee

----- Original Message -----
From: Zheng Shao<mailto:[email protected]>
To: [email protected]<mailto:[email protected]>
Sent: Monday, May 25, 2009 10:28 AM
Subject: Re: Hive and Hadoop streaming

How do your java map function receive the 4 columns?
I assume your java map function takes a WritableComparable key and Writable 
value.

Zheng

2009/5/24 Manhee Jo <[email protected]<mailto:[email protected]>>
I have some mappers already coded in Java. So I want to use it
as much as possible in Hive environment.
Then, how can I call a Java mapper to "select transform" in Hive?
For example, what is wrong with the query below and why?

INSERT OVERWRITE TABLE u_data_new
SELECT
  TRANSFORM (userid, movieid, rating, unixtime)
  USING 'java WeekdayMapper'
  AS (userid, movieid, rating, weekday)
FROM u_data;
Thank you.

Regards,
Manhee

--
Yours,
Zheng

--
Yours,
Zheng

--
My research interests are distributed systems, parallel computing and bytecode 
based virtual machine.

My profile:
http://www.linkedin.com/in/coderplay
My blog:
http://coderplay.javaeye.com

RE: Hive and Hadoop streaming

Reply via email to