[
https://issues.apache.org/jira/browse/HIVE-24706?focusedWorklogId=854939&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-854939
]
ASF GitHub Bot logged work on HIVE-24706:
-----------------------------------------
Author: ASF GitHub Bot
Created on: 05/Apr/23 05:53
Start Date: 05/Apr/23 05:53
Worklog Time Spent: 10m
Work Description: alexdongli0829 opened a new pull request, #4199:
URL: https://github.com/apache/hive/pull/4199
### What changes were proposed in this pull request?
For HIVE-24706, the main issue here is the HiveHbaseTableInput format
implements two version of InputFormat, which make the spark cannot get the
correct version correctly, and this is indeed not very clear implementation.
So in this request, instead of directly extending TableInputFormatBase, I
put it as a delegate which do the exactly the same as before, but avoid the
confusing because the HbaseStorageHandler just need the old version InputFormat.
In the long term, I think hive should update the storage handler instead of
keep mixing these two different API versions
### Why are the changes needed?
Its impacting the spark and hive compatible and reported by different uses
in hive and spark
### Does this PR introduce _any_ user-facing change?
There is configuration parameter added hive.hbase.inputformat.v2, so maybe
need update doc to keep the end user informed
### How was this patch tested?
create hbase table
```
echo "create 'students','account','address'" | sudo -u hbase hbase shell -n
echo "put 'students','student1','account:name','Alice'" |sudo -u hbase hbase
shell -n
```
create hive table
```
hive -e "create external table test1 (key string, value string)
> stored by 'org.apache.hadoop.hive.hbase.HBaseStorageHandler'
> with serdeproperties ('hbase.columns.mapping' = ':key,account:name')
> tblproperties ('hbase.table.name' = 'students')"
SLF4J: Class path contains multiple SLF4J bindings.
Logging initialized using configuration in
file:/etc/hive/conf.dist/hive-log4j2.properties Async: true
Hive Session ID = 05b4ec22-d15f-4614-9bc3-6c183e868728
OK
Time taken: 2.913 seconds
```
Spark test:
spark-sql --jars
/usr/lib/hive/lib/hive-hbase-handler.jar,/usr/lib/hbase/hbase-common-2.4.4.jar,/usr/lib/hbase/hbase-client-2.4.4.jar,/usr/lib/hbase/lib/hbase-mapreduce-2.4.4.jar,/usr/lib/hbase/lib/shaded-clients/hbase-shaded-client-2.4.4.jar
--conf spark.hive.hbase.inputformat.v2=true
```
spark-sql> select * from test1;
student1 Alice
```
Unit Test
```
[INFO] -------------------------------------------------------
[INFO] T E S T S
[INFO] -------------------------------------------------------
[INFO] Running org.apache.hadoop.hive.hbase.TestHiveHBaseTableInputFormatV2
[INFO] Tests run: 1, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 0.069
s - in org.apache.hadoop.hive.hbase.TestHiveHBaseTableInputFormatV2
[INFO]
[INFO] Results:
[INFO]
[INFO] Tests run: 1, Failures: 0, Errors: 0, Skipped: 0
```
Issue Time Tracking
-------------------
Worklog Id: (was: 854939)
Time Spent: 1h (was: 50m)
> Spark SQL access hive on HBase table access exception
> -----------------------------------------------------
>
> Key: HIVE-24706
> URL: https://issues.apache.org/jira/browse/HIVE-24706
> Project: Hive
> Issue Type: Bug
> Components: HBase Handler
> Reporter: zhangzhanchang
> Priority: Major
> Labels: pull-request-available
> Attachments: image-2021-01-30-15-51-58-665.png
>
> Time Spent: 1h
> Remaining Estimate: 0h
>
> Hivehbasetableinputformat relies on two versions of inputformat,one is
> org.apache.hadoop.mapred.InputFormat, the other is
> org.apache.hadoop.mapreduce.InputFormat,Causes
> spark 3.0(https://github.com/apache/spark/pull/31302) both conditions to be
> true:
> # classOf[oldInputClass[_, _]].isAssignableFrom(inputFormatClazz) is true
> # classOf[newInputClass[_, _]].isAssignableFrom(inputFormatClazz) is true
> !image-2021-01-30-15-51-58-665.png|width=430,height=137!
> Hivehbasetableinputformat relies on inputformat to be changed to
> org.apache.hadoop.mapreduce or org.apache.hadoop.mapred?
>
--
This message was sent by Atlassian Jira
(v8.20.10#820010)