[jira] [Updated] (HIVE-10830) First column of a Hive table created with LazyBinaryColumnarSerDe is not read properly

lovekesh bansal (JIRA) Tue, 26 May 2015 22:08:07 -0700

     [ 
https://issues.apache.org/jira/browse/HIVE-10830?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]


lovekesh bansal updated HIVE-10830:
-----------------------------------
    Description: 
1. create external table platdev.table_target ( id INT, message String, state 
string, date string ) partitioned by (country string) row format delimited 
fields terminated by ',' stored as RCFILE location 
'/user/nikgupta/table_target' ;

2. insert overwrite table platdev.table_target partition(country) select case 
when id=13 then 15 else id end,message,state,date,country from 
platdev.table_base2 where id between 13 and 16; \n"

say now my table is written by default using LazyBinaryColumnarSerDe and has 
the following data:
15      thirteen        delhi           2-12-2014       india
14      fourteen        delhi           1-1-2014                india
15      fifteen florida 1-1-2014                us
16      sixteen florida 2-12-2014       us

Now If I try to read the data with a mapreduce program, with map function as 
given below:

public void map(LongWritable key, BytesRefArrayWritable val, Context context)
    throws IOException, InterruptedException {
    
    for (int i = 0; i < val.size(); i++) {
     BytesRefWritable bytesRefread = val.get(i);
     byte[] currentCell = Arrays.copyOfRange(bytesRefread.getData(), 
bytesRefread.getStart(), bytesRefread.getStart()+bytesRefread.getLength());
     Text currentCellStr = new Text(currentCell);
     System.out.println("rowText="+currentCellStr       );
    }
    context.write(NullWritable.get(), bytes);
   }


and set  the following job configuration parameters:- 

job.setInputFormatClass(RCFileMapReduceInputFormat.class);
job.setOutputFormatClass(RCFileMapReduceOutputFormat.class);
jobConf.setInt(RCFile.COLUMN_NUMBER_CONF_STR, 5)
             

The output shown is as follows: (LazyBinaryColumnarSerDe)
rowText=
rowText=fifteen
rowText=goa
rowText=2-2-2222
rowText=us

But exactly the same case using the (ColumnarSerDe) explicitly in the table 
definition would give the following output:
rowText=1
rowText=fifteen
rowText=goa
rowText=2-2-2222
rowText=us

Point is that First column value is missing in the case of 
LazyBinaryColumnarSerDe.

  was:
1. create external table platdev.table_target ( id INT, message String, state 
string, date string ) partitioned by (country string) row format delimited 
fields terminated by ',' stored as RCFILE location 
'/user/nikgupta/table_target' ;

2. insert overwrite table platdev.table_target partition(country) select case 
when id=13 then 15 else id end,message,state,date,country from 
platdev.table_base2 where id between 13 and 16; \n"

say now my table has the following data:
15      thirteen        delhi           2-12-2014       india
14      fourteen        delhi           1-1-2014                india
15      fifteen florida 1-1-2014                us
16      sixteen florida 2-12-2014       us

Now If I try to read the data with a mapreduce program, with map function as 
given below:

public void map(LongWritable key, BytesRefArrayWritable val, Context context)
    throws IOException, InterruptedException {
    
    for (int i = 0; i < val.size(); i++) {
     BytesRefWritable bytesRefread = val.get(i);
     byte[] currentCell = Arrays.copyOfRange(bytesRefread.getData(), 
bytesRefread.getStart(), bytesRefread.getStart()+bytesRefread.getLength());
     Text currentCellStr = new Text(currentCell);
     System.out.println("rowText="+currentCellStr       );
    }
    context.write(NullWritable.get(), bytes);
   }


and set  the following job configuration parameters:- 

job.setInputFormatClass(RCFileMapReduceInputFormat.class);
job.setOutputFormatClass(RCFileMapReduceOutputFormat.class);
jobConf.setInt(RCFile.COLUMN_NUMBER_CONF_STR, 5)
             

The output shown is as follows:
rowText=
rowText=fifteen
rowText=goa
rowText=2-2-2222
rowText=us

But exactly the same case using the ColumnarSerDe explicitly in the table 
definition would give the following output:
rowText=1
rowText=fifteen
rowText=goa
rowText=2-2-2222
rowText=us

Point is that First column value is missing. 


> First column of a Hive table created with LazyBinaryColumnarSerDe is not read 
> properly
> --------------------------------------------------------------------------------------
>
>                 Key: HIVE-10830
>                 URL: https://issues.apache.org/jira/browse/HIVE-10830
>             Project: Hive
>          Issue Type: Bug
>            Reporter: lovekesh bansal
>
> 1. create external table platdev.table_target ( id INT, message String, state 
> string, date string ) partitioned by (country string) row format delimited 
> fields terminated by ',' stored as RCFILE location 
> '/user/nikgupta/table_target' ;
> 2. insert overwrite table platdev.table_target partition(country) select case 
> when id=13 then 15 else id end,message,state,date,country from 
> platdev.table_base2 where id between 13 and 16; \n"
> say now my table is written by default using LazyBinaryColumnarSerDe and has 
> the following data:
> 15    thirteen        delhi           2-12-2014       india
> 14    fourteen        delhi           1-1-2014                india
> 15    fifteen florida 1-1-2014                us
> 16    sixteen florida 2-12-2014       us
> Now If I try to read the data with a mapreduce program, with map function as 
> given below:
> public void map(LongWritable key, BytesRefArrayWritable val, Context context)
>     throws IOException, InterruptedException {
>     
>     for (int i = 0; i < val.size(); i++) {
>      BytesRefWritable bytesRefread = val.get(i);
>      byte[] currentCell = Arrays.copyOfRange(bytesRefread.getData(), 
> bytesRefread.getStart(), bytesRefread.getStart()+bytesRefread.getLength());
>      Text currentCellStr = new Text(currentCell);
>      System.out.println("rowText="+currentCellStr     );
>     }
>     context.write(NullWritable.get(), bytes);
>    }
> and set  the following job configuration parameters:- 
> job.setInputFormatClass(RCFileMapReduceInputFormat.class);
> job.setOutputFormatClass(RCFileMapReduceOutputFormat.class);
> jobConf.setInt(RCFile.COLUMN_NUMBER_CONF_STR, 5)
>              
> The output shown is as follows: (LazyBinaryColumnarSerDe)
> rowText=
> rowText=fifteen
> rowText=goa
> rowText=2-2-2222
> rowText=us
> But exactly the same case using the (ColumnarSerDe) explicitly in the table 
> definition would give the following output:
> rowText=1
> rowText=fifteen
> rowText=goa
> rowText=2-2-2222
> rowText=us
> Point is that First column value is missing in the case of 
> LazyBinaryColumnarSerDe.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (HIVE-10830) First column of a Hive table created with LazyBinaryColumnarSerDe is not read properly

Reply via email to