alexdongli0829 opened a new pull request, #4199:
URL: https://github.com/apache/hive/pull/4199
### What changes were proposed in this pull request?
For HIVE-24706, the main issue here is the HiveHbaseTableInput format
implements two version of InputFormat, which make the spark cannot get the
correct version correctly, and this is indeed not very clear implementation.
So in this request, instead of directly extending TableInputFormatBase, I
put it as a delegate which do the exactly the same as before, but avoid the
confusing because the HbaseStorageHandler just need the old version InputFormat.
In the long term, I think hive should update the storage handler instead of
keep mixing these two different API versions
### Why are the changes needed?
Its impacting the spark and hive compatible and reported by different uses
in hive and spark
### Does this PR introduce _any_ user-facing change?
There is configuration parameter added hive.hbase.inputformat.v2, so maybe
need update doc to keep the end user informed
### How was this patch tested?
create hbase table
```
echo "create 'students','account','address'" | sudo -u hbase hbase shell -n
echo "put 'students','student1','account:name','Alice'" |sudo -u hbase hbase
shell -n
```
create hive table
```
hive -e "create external table test1 (key string, value string)
> stored by 'org.apache.hadoop.hive.hbase.HBaseStorageHandler'
> with serdeproperties ('hbase.columns.mapping' = ':key,account:name')
> tblproperties ('hbase.table.name' = 'students')"
SLF4J: Class path contains multiple SLF4J bindings.
Logging initialized using configuration in
file:/etc/hive/conf.dist/hive-log4j2.properties Async: true
Hive Session ID = 05b4ec22-d15f-4614-9bc3-6c183e868728
OK
Time taken: 2.913 seconds
```
Spark test:
spark-sql --jars
/usr/lib/hive/lib/hive-hbase-handler.jar,/usr/lib/hbase/hbase-common-2.4.4.jar,/usr/lib/hbase/hbase-client-2.4.4.jar,/usr/lib/hbase/lib/hbase-mapreduce-2.4.4.jar,/usr/lib/hbase/lib/shaded-clients/hbase-shaded-client-2.4.4.jar
--conf spark.hive.hbase.inputformat.v2=true
```
spark-sql> select * from test1;
student1 Alice
```
Unit Test
```
[INFO] -------------------------------------------------------
[INFO] T E S T S
[INFO] -------------------------------------------------------
[INFO] Running org.apache.hadoop.hive.hbase.TestHiveHBaseTableInputFormatV2
[INFO] Tests run: 1, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 0.069
s - in org.apache.hadoop.hive.hbase.TestHiveHBaseTableInputFormatV2
[INFO]
[INFO] Results:
[INFO]
[INFO] Tests run: 1, Failures: 0, Errors: 0, Skipped: 0
```
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]