alexdongli0829 opened a new pull request, #4199:
URL: https://github.com/apache/hive/pull/4199

   ### What changes were proposed in this pull request?
   
   For HIVE-24706, the main issue here is the HiveHbaseTableInput format 
implements two version of InputFormat, which make the spark cannot get the 
correct version correctly, and this is indeed not very clear implementation.
   
   So in this request, instead of directly extending TableInputFormatBase, I 
put it as a delegate which do the exactly the same as before, but avoid the 
confusing because the HbaseStorageHandler just need the old version InputFormat.
   
   In the long term, I think hive should update the storage handler instead of 
keep mixing these two different API versions
   
   
   ### Why are the changes needed?
   
   Its impacting the spark and hive compatible and reported by different uses 
in hive and spark
   
   
   ### Does this PR introduce _any_ user-facing change?
   There is configuration parameter added hive.hbase.inputformat.v2, so maybe 
need update doc to keep the end user informed
   
   
   ### How was this patch tested?
   
   create hbase table
   
   ```
   echo "create 'students','account','address'" | sudo -u hbase hbase shell -n
   echo "put 'students','student1','account:name','Alice'" |sudo -u hbase hbase 
shell -n
   ```
   
   create hive table
   ```
   hive -e "create external table test1 (key string, value string)
   > stored by 'org.apache.hadoop.hive.hbase.HBaseStorageHandler'
   > with serdeproperties ('hbase.columns.mapping' = ':key,account:name')
   > tblproperties ('hbase.table.name' = 'students')"
   
   
   SLF4J: Class path contains multiple SLF4J bindings.
   Logging initialized using configuration in 
file:/etc/hive/conf.dist/hive-log4j2.properties Async: true
   Hive Session ID = 05b4ec22-d15f-4614-9bc3-6c183e868728
   OK
   Time taken: 2.913 seconds
   ```
   
   Spark test:
   
   spark-sql --jars 
/usr/lib/hive/lib/hive-hbase-handler.jar,/usr/lib/hbase/hbase-common-2.4.4.jar,/usr/lib/hbase/hbase-client-2.4.4.jar,/usr/lib/hbase/lib/hbase-mapreduce-2.4.4.jar,/usr/lib/hbase/lib/shaded-clients/hbase-shaded-client-2.4.4.jar
 --conf spark.hive.hbase.inputformat.v2=true
   
   ```
   spark-sql> select * from test1;
   
   student1    Alice
   ```
   
   Unit Test
   
   ```
   [INFO] -------------------------------------------------------
   [INFO]  T E S T S
   [INFO] -------------------------------------------------------
   [INFO] Running org.apache.hadoop.hive.hbase.TestHiveHBaseTableInputFormatV2
   [INFO] Tests run: 1, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 0.069 
s - in org.apache.hadoop.hive.hbase.TestHiveHBaseTableInputFormatV2
   [INFO]
   [INFO] Results:
   [INFO]
   [INFO] Tests run: 1, Failures: 0, Errors: 0, Skipped: 0
   ```


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to