On Wed, Sep 22, 2010 at 12:08 AM, Tianqiang  Li <peter...@gmail.com> wrote:
> Hi,
> I have customized InputFormat class to read our log format in our hadoop job
> and Pig, which is built on top of Hadoop 0.20 api, now I'd like to re-use
> this inputformat to load data into Hive table by specifying InputFormat, and
> a Serde when I create a table like below:
>
> CREATE TABLE rawlog_test (
>   user_id  STRING,
>   tag  STRING,
>   my_timestamp  STRING )
> ROW FORMAT SERDE 'x.y.z.mySerDe'
> STORED AS INPUTFORMAT 'x.y.z.myInputFormat'
> OUTPUTFORMAT 'org.apache.hadoop.mapred.SequenceFileOutputFormat' ;
>
> Then I run:
> load data inpath '/rawlog.txt' into table rawlog_test;
>
> No error show up on screen but I found the deserialize function never got
> called. An when I use select * from rawlog_test; An error was threw out:
>
> FAILED: Error in semantic analysis: line 1:14 Input Format must implement
> InputFormat rawlog_test
>
> I search this on internet, found this might be related to Hive using old
> api(0.17) of InputFormat, does anybody know are there a way to get 0.20api
> worked on Hive? Adapt my code to old api need lots of work, and even if I
> get it done, maintaining two version of code sounds like a bit unnecessary,
> ( Pig 0.7 works well with my v0.20 of InputFormat, we need to use Pig and
> Hive at different situations. ) , are there any way that I can work around
> this? My version of Hive is 0.7, and hadoop is 0.20.1 from CDH2. Thanks.
>
> Regards,
> Peter
>
>

You can make a 20 InputFormat work with hive but its real PITA. The
hbase and cassandra handler both do it.Essentially you have to Extend
the new mapreduce input format and then implement methods in the old
one, use final variables and chained method calls. Example here:
https://issues.apache.org/jira/secure/attachment/12452140/hive-1434-4-patch.txt
Essentially it if your input format is simple enough it is likely
easier to write two separate classes for both old api and new. Use the
mapred.* InputFormat with hive.

Reply via email to