[
https://issues.apache.org/jira/browse/HIVE-13539?focusedWorklogId=446738&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-446738
]
ASF GitHub Bot logged work on HIVE-13539:
-----------------------------------------
Author: ASF GitHub Bot
Created on: 16/Jun/20 16:57
Start Date: 16/Jun/20 16:57
Worklog Time Spent: 10m
Work Description: github-actions[bot] commented on pull request #74:
URL: https://github.com/apache/hive/pull/74#issuecomment-644888500
This pull request has been automatically marked as stale because it has not
had recent activity. It will be closed if no further activity occurs.
Feel free to reach out on the [email protected] list if the patch is in
need of reviews.
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
[email protected]
Issue Time Tracking
-------------------
Worklog Id: (was: 446738)
Remaining Estimate: 0h
Time Spent: 10m
> HiveHFileOutputFormat searching the wrong directory for HFiles
> --------------------------------------------------------------
>
> Key: HIVE-13539
> URL: https://issues.apache.org/jira/browse/HIVE-13539
> Project: Hive
> Issue Type: Bug
> Components: HBase Handler
> Affects Versions: 1.1.0
> Environment: Built into CDH 5.4.7
> Reporter: Tim Robertson
> Assignee: Chaoyu Tang
> Priority: Blocker
> Fix For: 2.1.1, 2.2.0
>
> Attachments: HIVE-13539.1.patch, HIVE-13539.patch,
> hive_hfile_output_format.q, hive_hfile_output_format.q.out
>
> Time Spent: 10m
> Remaining Estimate: 0h
>
> When creating HFiles for a bulkload in HBase I believe it is looking in the
> wrong directory to find the HFiles, resulting in the following exception:
> {code}
> Error: java.lang.RuntimeException: Hive Runtime Error while closing
> operators: java.io.IOException: Multiple family directories found in
> hdfs://c1n1.gbif.org:8020/user/hive/warehouse/tim.db/coords_hbase/_temporary/2/_temporary
> at
> org.apache.hadoop.hive.ql.exec.mr.ExecReducer.close(ExecReducer.java:295)
> at
> org.apache.hadoop.mapred.ReduceTask.runOldReducer(ReduceTask.java:453)
> at org.apache.hadoop.mapred.ReduceTask.run(ReduceTask.java:392)
> at org.apache.hadoop.mapred.YarnChild$2.run(YarnChild.java:163)
> at java.security.AccessController.doPrivileged(Native Method)
> at javax.security.auth.Subject.doAs(Subject.java:415)
> at
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1671)
> at org.apache.hadoop.mapred.YarnChild.main(YarnChild.java:158)
> Caused by: org.apache.hadoop.hive.ql.metadata.HiveException:
> java.io.IOException: Multiple family directories found in
> hdfs://c1n1.gbif.org:8020/user/hive/warehouse/tim.db/coords_hbase/_temporary/2/_temporary
> at
> org.apache.hadoop.hive.ql.exec.FileSinkOperator$FSPaths.closeWriters(FileSinkOperator.java:188)
> at
> org.apache.hadoop.hive.ql.exec.FileSinkOperator.closeOp(FileSinkOperator.java:958)
> at org.apache.hadoop.hive.ql.exec.Operator.close(Operator.java:598)
> at org.apache.hadoop.hive.ql.exec.Operator.close(Operator.java:610)
> at
> org.apache.hadoop.hive.ql.exec.mr.ExecReducer.close(ExecReducer.java:287)
> ... 7 more
> Caused by: java.io.IOException: Multiple family directories found in
> hdfs://c1n1.gbif.org:8020/user/hive/warehouse/tim.db/coords_hbase/_temporary/2/_temporary
> at
> org.apache.hadoop.hive.hbase.HiveHFileOutputFormat$1.close(HiveHFileOutputFormat.java:158)
> at
> org.apache.hadoop.hive.ql.exec.FileSinkOperator$FSPaths.closeWriters(FileSinkOperator.java:185)
> ... 11 more
> {code}
> The issue is that is looks for the HFiles in
> {{hdfs://c1n1.gbif.org:8020/user/hive/warehouse/tim.db/coords_hbase/_temporary/2/_temporary}}
> when I believe it should be looking in the task attempt subfolder, such as
> {{hdfs://c1n1.gbif.org:8020/user/hive/warehouse/tim.db/coords_hbase/_temporary/2/_temporary/attempt_1461004169450_0002_r_000000_1000}}.
> This can be reproduced in any HFile creation such as:
> {code:sql}
> CREATE TABLE coords_hbase(id INT, x DOUBLE, y DOUBLE)
> STORED BY 'org.apache.hadoop.hive.hbase.HBaseStorageHandler'
> WITH SERDEPROPERTIES (
> 'hbase.columns.mapping' = ':key,o:x,o:y',
> 'hbase.table.default.storage.type' = 'binary');
> SET hfile.family.path=/tmp/coords_hfiles/o;
> SET hive.hbase.generatehfiles=true;
> INSERT OVERWRITE TABLE coords_hbase
> SELECT id, decimalLongitude, decimalLatitude
> FROM source
> CLUSTER BY id;
> {code}
> Any advice greatly appreciated
--
This message was sent by Atlassian Jira
(v8.3.4#803005)