Re: Reading 2 table data in MapReduce for Performing Join

Suraj Nayak Thu, 19 Mar 2015 09:21:07 -0700

Is this related to https://issues.apache.org/jira/browse/HIVE-4329 ? Is
there a workaround?


On Thu, Mar 19, 2015 at 9:47 PM, Suraj Nayak <snay...@gmail.com> wrote:

> Hi All,
>
> I was successfully able to integrate HCatMultipleInputs with the patch for
> the tables created with TEXTFILE. But I get error when I read table created
> with ORC file. The error is below :
>
> 15/03/19 10:51:32 INFO mapreduce.Job: Task Id :
> attempt_1425012118520_9756_m_000000_0, Status : FAILED
> Error: java.lang.ClassCastException: org.apache.hadoop.io.NullWritable
> cannot be cast to org.apache.hadoop.io.LongWritable
>     at com.abccompany.mapreduce.MyMapper.map(MyMapper.java:15)
>     at org.apache.hadoop.mapreduce.Mapper.run(Mapper.java:145)
>     at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:764)
>     at org.apache.hadoop.mapred.MapTask.run(MapTask.java:340)
>     at org.apache.hadoop.mapred.YarnChild$2.run(YarnChild.java:167)
>     at java.security.AccessController.doPrivileged(Native Method)
>     at javax.security.auth.Subject.doAs(Subject.java:415)
>     at
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1557)
>     at org.apache.hadoop.mapred.YarnChild.main(YarnChild.java:162)
>
>
> Can anyone help?
>
> Thanks in advance!
>
> On Wed, Mar 18, 2015 at 11:00 PM, Suraj Nayak <snay...@gmail.com> wrote:
>
>> Hi All,
>>
>> https://issues.apache.org/jira/browse/HIVE-4997 patch helped!
>>
>>
>> On Tue, Mar 17, 2015 at 1:05 AM, Suraj Nayak <snay...@gmail.com> wrote:
>>
>>> Hi,
>>>
>>> I tried reading data via HCatalog for 1 Hive table in MapReduce using
>>> something similar to
>>> https://cwiki.apache.org/confluence/display/Hive/HCatalog+InputOutput#HCatalogInputOutput-RunningMapReducewithHCatalog.
>>> I was able to read successfully.
>>>
>>> Now am trying to read 2 tables, as the requirement is to join 2 tables.
>>> I did not find API similar to *FileInputFormat.addInputPaths* in
>>> *HCatInputFormat*. What is the equivalent of the same in HCat ?
>>>
>>> I had performed join using FilesInputFormat in HDFS(by getting split
>>> information in mapper). This article(
>>> http://www.codingjunkie.com/mapreduce-reduce-joins) helped me code join.
>>> <http://www.codingjunkie.com/mapreduce-reduce-joins/> Can someone
>>> suggest how I can perform join operation using HCatalog ?
>>>
>>> Briefly, the aim is to
>>>
>>>    - Read 2 tables (almost similar schema)
>>>    - If key exists in both the table send it to same reducer.
>>>    - Do some processing on the records in reducer.
>>>    - Save the output into file/Hive table.
>>>
>>> *P.S : The reason for using MapReduce to perform join is because of
>>> complex requirement which can't be solved via Hive/Pig directly. *
>>>
>>> Any help will be greatly appreciated :)
>>>
>>> --
>>> Thanks
>>> Suraj Nayak M
>>>
>>
>>
>>
>> --
>> Thanks
>> Suraj Nayak M
>>
>
>
>
> --
> Thanks
> Suraj Nayak M
>



-- 
Thanks
Suraj Nayak M

Re: Reading 2 table data in MapReduce for Performing Join

Reply via email to