[ 
https://issues.apache.org/jira/browse/HIVE-24706?focusedWorklogId=854939&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-854939
 ]

ASF GitHub Bot logged work on HIVE-24706:
-----------------------------------------

                Author: ASF GitHub Bot
            Created on: 05/Apr/23 05:53
            Start Date: 05/Apr/23 05:53
    Worklog Time Spent: 10m 
      Work Description: alexdongli0829 opened a new pull request, #4199:
URL: https://github.com/apache/hive/pull/4199

   ### What changes were proposed in this pull request?
   
   For HIVE-24706, the main issue here is the HiveHbaseTableInput format 
implements two version of InputFormat, which make the spark cannot get the 
correct version correctly, and this is indeed not very clear implementation.
   
   So in this request, instead of directly extending TableInputFormatBase, I 
put it as a delegate which do the exactly the same as before, but avoid the 
confusing because the HbaseStorageHandler just need the old version InputFormat.
   
   In the long term, I think hive should update the storage handler instead of 
keep mixing these two different API versions
   
   
   ### Why are the changes needed?
   
   Its impacting the spark and hive compatible and reported by different uses 
in hive and spark
   
   
   ### Does this PR introduce _any_ user-facing change?
   There is configuration parameter added hive.hbase.inputformat.v2, so maybe 
need update doc to keep the end user informed
   
   
   ### How was this patch tested?
   
   create hbase table
   
   ```
   echo "create 'students','account','address'" | sudo -u hbase hbase shell -n
   echo "put 'students','student1','account:name','Alice'" |sudo -u hbase hbase 
shell -n
   ```
   
   create hive table
   ```
   hive -e "create external table test1 (key string, value string)
   > stored by 'org.apache.hadoop.hive.hbase.HBaseStorageHandler'
   > with serdeproperties ('hbase.columns.mapping' = ':key,account:name')
   > tblproperties ('hbase.table.name' = 'students')"
   
   
   SLF4J: Class path contains multiple SLF4J bindings.
   Logging initialized using configuration in 
file:/etc/hive/conf.dist/hive-log4j2.properties Async: true
   Hive Session ID = 05b4ec22-d15f-4614-9bc3-6c183e868728
   OK
   Time taken: 2.913 seconds
   ```
   
   Spark test:
   
   spark-sql --jars 
/usr/lib/hive/lib/hive-hbase-handler.jar,/usr/lib/hbase/hbase-common-2.4.4.jar,/usr/lib/hbase/hbase-client-2.4.4.jar,/usr/lib/hbase/lib/hbase-mapreduce-2.4.4.jar,/usr/lib/hbase/lib/shaded-clients/hbase-shaded-client-2.4.4.jar
 --conf spark.hive.hbase.inputformat.v2=true
   
   ```
   spark-sql> select * from test1;
   
   student1    Alice
   ```
   
   Unit Test
   
   ```
   [INFO] -------------------------------------------------------
   [INFO]  T E S T S
   [INFO] -------------------------------------------------------
   [INFO] Running org.apache.hadoop.hive.hbase.TestHiveHBaseTableInputFormatV2
   [INFO] Tests run: 1, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 0.069 
s - in org.apache.hadoop.hive.hbase.TestHiveHBaseTableInputFormatV2
   [INFO]
   [INFO] Results:
   [INFO]
   [INFO] Tests run: 1, Failures: 0, Errors: 0, Skipped: 0
   ```




Issue Time Tracking
-------------------

    Worklog Id:     (was: 854939)
    Time Spent: 1h  (was: 50m)

> Spark SQL access hive on HBase table access exception
> -----------------------------------------------------
>
>                 Key: HIVE-24706
>                 URL: https://issues.apache.org/jira/browse/HIVE-24706
>             Project: Hive
>          Issue Type: Bug
>          Components: HBase Handler
>            Reporter: zhangzhanchang
>            Priority: Major
>              Labels: pull-request-available
>         Attachments: image-2021-01-30-15-51-58-665.png
>
>          Time Spent: 1h
>  Remaining Estimate: 0h
>
> Hivehbasetableinputformat relies on two versions of inputformat,one is 
> org.apache.hadoop.mapred.InputFormat, the other is 
> org.apache.hadoop.mapreduce.InputFormat,Causes
> spark 3.0(https://github.com/apache/spark/pull/31302) both conditions to be 
> true:
>  # classOf[oldInputClass[_, _]].isAssignableFrom(inputFormatClazz) is true
>  # classOf[newInputClass[_, _]].isAssignableFrom(inputFormatClazz) is true
> !image-2021-01-30-15-51-58-665.png|width=430,height=137!
> Hivehbasetableinputformat relies on inputformat to be changed to 
> org.apache.hadoop.mapreduce or org.apache.hadoop.mapred?
>  



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

Reply via email to