[ 
https://issues.apache.org/jira/browse/HIVE-4969?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Venki Korukanti updated HIVE-4969:
----------------------------------

    Attachment: HIVE-4969-1.patch
    
> HCatalog HBaseHCatStorageHandler is not returning all the data
> --------------------------------------------------------------
>
>                 Key: HIVE-4969
>                 URL: https://issues.apache.org/jira/browse/HIVE-4969
>             Project: Hive
>          Issue Type: Bug
>          Components: HCatalog
>    Affects Versions: 0.11.0
>            Reporter: Venki Korukanti
>            Priority: Critical
>             Fix For: 0.11.1, 0.12.0
>
>         Attachments: HIVE-4969-1.patch
>
>
> Repro steps:
> 1) Create an HCatalog table mapped to HBase table.
> hcat -e "CREATE TABLE studentHCat(rownum int, name string, age int, gpa float)
>          STORED BY 'org.apache.hcatalog.hbase.HBaseHCatStorageHandler'
>          TBLPROPERTIES('hbase.table.name' ='studentHBase',  
>                        'hbase.columns.mapping' =                
>                             ':key,onecf:name,twocf:age,threecf:gpa')";
> 2) Load the following data from Pig.
> cat student_data
> 1^Asarah laertes^A23^A2.40
> 2^Atom allen^A72^A1.57
> 3^Abob ovid^A61^A2.67
> 4^Aethan nixon^A38^A2.15
> 5^Acalvin robinson^A28^A2.53
> 6^Airene ovid^A65^A2.56
> 7^Ayuri garcia^A36^A1.65
> 8^Acalvin nixon^A41^A1.04
> 9^Ajessica davidson^A48^A2.11
> 10^Akatie king^A39^A1.05
> grunt> A = LOAD 'student_data' AS 
> (rownum:int,name:chararray,age:int,gpa:float);
> grunt> STORE A INTO 'studentHCat' USING org.apache.hcatalog.pig.HCatStorer();
> 3) Now from HBase do a scan on the studentHBase table
> hbase(main):026:0> scan 'studentPig', {LIMIT => 5}
> 4) From pig access the data in table
> grunt> A = LOAD 'studentHCat' USING org.apache.hcatalog.pig.HCatLoader();
> grunt> STORE A INTO '/user/root/studentPig';
> 5) Verify the output written in StudentPig
> hadoop fs -cat /user/root/studentPig/part-r-00000
> 1              23
> 2              72
> 3              61
> 4              38
> 5              28
> 6              65
> 7              36
> 8              41
> 9              48
> 10             39
> The data returned has only two fields (rownum and age).
> Problem:
> While reading the data from HBase table, HbaseSnapshotRecordReader gets data 
> row in Result (org.apache.hadoop.hbase.client.Result) object and processes 
> the KeyValue fields in it. After processing, it creates another Result object 
> out of the processed KeyValue array. Problem here is KeyValue array is not 
> sorted. Result object expects the input KeyValue array to have sorted 
> elements. When we call the Result.getValue() it returns no value for some of 
> the fields as it does a binary search on un-ordered array.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

Reply via email to