[
https://issues.apache.org/jira/browse/HIVE-4969?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Thejas M Nair updated HIVE-4969:
--------------------------------
Fix Version/s: (was: 0.11.1)
(was: 0.12.0)
Preparing for 0.12 release. Removing fix version of 0.12 for those that are not
in 0.12 branch.
> HCatalog HBaseHCatStorageHandler is not returning all the data
> --------------------------------------------------------------
>
> Key: HIVE-4969
> URL: https://issues.apache.org/jira/browse/HIVE-4969
> Project: Hive
> Issue Type: Bug
> Components: HCatalog
> Affects Versions: 0.11.0
> Reporter: Venki Korukanti
> Priority: Critical
> Attachments: HIVE-4969-1.patch
>
>
> Repro steps:
> 1) Create an HCatalog table mapped to HBase table.
> hcat -e "CREATE TABLE studentHCat(rownum int, name string, age int, gpa float)
> STORED BY 'org.apache.hcatalog.hbase.HBaseHCatStorageHandler'
> TBLPROPERTIES('hbase.table.name' ='studentHBase',
> 'hbase.columns.mapping' =
> ':key,onecf:name,twocf:age,threecf:gpa')";
> 2) Load the following data from Pig.
> cat student_data
> 1^Asarah laertes^A23^A2.40
> 2^Atom allen^A72^A1.57
> 3^Abob ovid^A61^A2.67
> 4^Aethan nixon^A38^A2.15
> 5^Acalvin robinson^A28^A2.53
> 6^Airene ovid^A65^A2.56
> 7^Ayuri garcia^A36^A1.65
> 8^Acalvin nixon^A41^A1.04
> 9^Ajessica davidson^A48^A2.11
> 10^Akatie king^A39^A1.05
> grunt> A = LOAD 'student_data' AS
> (rownum:int,name:chararray,age:int,gpa:float);
> grunt> STORE A INTO 'studentHCat' USING org.apache.hcatalog.pig.HCatStorer();
> 3) Now from HBase do a scan on the studentHBase table
> hbase(main):026:0> scan 'studentPig', {LIMIT => 5}
> 4) From pig access the data in table
> grunt> A = LOAD 'studentHCat' USING org.apache.hcatalog.pig.HCatLoader();
> grunt> STORE A INTO '/user/root/studentPig';
> 5) Verify the output written in StudentPig
> hadoop fs -cat /user/root/studentPig/part-r-00000
> 1 23
> 2 72
> 3 61
> 4 38
> 5 28
> 6 65
> 7 36
> 8 41
> 9 48
> 10 39
> The data returned has only two fields (rownum and age).
> Problem:
> While reading the data from HBase table, HbaseSnapshotRecordReader gets data
> row in Result (org.apache.hadoop.hbase.client.Result) object and processes
> the KeyValue fields in it. After processing, it creates another Result object
> out of the processed KeyValue array. Problem here is KeyValue array is not
> sorted. Result object expects the input KeyValue array to have sorted
> elements. When we call the Result.getValue() it returns no value for some of
> the fields as it does a binary search on un-ordered array.
--
This message was sent by Atlassian JIRA
(v6.1#6144)