[
https://issues.apache.org/jira/browse/HIVE-13539?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Chaoyu Tang updated HIVE-13539:
-------------------------------
Attachment: HIVE-13539.patch
qtest is not able to reproduce the seen issue since it runs the test using
local mode MR which runs multiple reducers sequentially. Manually tests have
been done to reproduce the issue in cluster and verified the patch fixes the
problem.
Attached patch is same as [~timrobertson100] provided, except some code to
throw out an exception instead of hanging for the case where the last component
of the specified hfile does not match the column family name.
> HiveHFileOutputFormat searching the wrong directory for HFiles
> --------------------------------------------------------------
>
> Key: HIVE-13539
> URL: https://issues.apache.org/jira/browse/HIVE-13539
> Project: Hive
> Issue Type: Bug
> Components: HBase Handler
> Affects Versions: 1.1.0
> Environment: Built into CDH 5.4.7
> Reporter: Tim Robertson
> Assignee: Chaoyu Tang
> Priority: Blocker
> Attachments: HIVE-13539.patch, hive_hfile_output_format.q,
> hive_hfile_output_format.q.out
>
>
> When creating HFiles for a bulkload in HBase I believe it is looking in the
> wrong directory to find the HFiles, resulting in the following exception:
> {code}
> Error: java.lang.RuntimeException: Hive Runtime Error while closing
> operators: java.io.IOException: Multiple family directories found in
> hdfs://c1n1.gbif.org:8020/user/hive/warehouse/tim.db/coords_hbase/_temporary/2/_temporary
> at
> org.apache.hadoop.hive.ql.exec.mr.ExecReducer.close(ExecReducer.java:295)
> at
> org.apache.hadoop.mapred.ReduceTask.runOldReducer(ReduceTask.java:453)
> at org.apache.hadoop.mapred.ReduceTask.run(ReduceTask.java:392)
> at org.apache.hadoop.mapred.YarnChild$2.run(YarnChild.java:163)
> at java.security.AccessController.doPrivileged(Native Method)
> at javax.security.auth.Subject.doAs(Subject.java:415)
> at
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1671)
> at org.apache.hadoop.mapred.YarnChild.main(YarnChild.java:158)
> Caused by: org.apache.hadoop.hive.ql.metadata.HiveException:
> java.io.IOException: Multiple family directories found in
> hdfs://c1n1.gbif.org:8020/user/hive/warehouse/tim.db/coords_hbase/_temporary/2/_temporary
> at
> org.apache.hadoop.hive.ql.exec.FileSinkOperator$FSPaths.closeWriters(FileSinkOperator.java:188)
> at
> org.apache.hadoop.hive.ql.exec.FileSinkOperator.closeOp(FileSinkOperator.java:958)
> at org.apache.hadoop.hive.ql.exec.Operator.close(Operator.java:598)
> at org.apache.hadoop.hive.ql.exec.Operator.close(Operator.java:610)
> at
> org.apache.hadoop.hive.ql.exec.mr.ExecReducer.close(ExecReducer.java:287)
> ... 7 more
> Caused by: java.io.IOException: Multiple family directories found in
> hdfs://c1n1.gbif.org:8020/user/hive/warehouse/tim.db/coords_hbase/_temporary/2/_temporary
> at
> org.apache.hadoop.hive.hbase.HiveHFileOutputFormat$1.close(HiveHFileOutputFormat.java:158)
> at
> org.apache.hadoop.hive.ql.exec.FileSinkOperator$FSPaths.closeWriters(FileSinkOperator.java:185)
> ... 11 more
> {code}
> The issue is that is looks for the HFiles in
> {{hdfs://c1n1.gbif.org:8020/user/hive/warehouse/tim.db/coords_hbase/_temporary/2/_temporary}}
> when I believe it should be looking in the task attempt subfolder, such as
> {{hdfs://c1n1.gbif.org:8020/user/hive/warehouse/tim.db/coords_hbase/_temporary/2/_temporary/attempt_1461004169450_0002_r_000000_1000}}.
> This can be reproduced in any HFile creation such as:
> {code:sql}
> CREATE TABLE coords_hbase(id INT, x DOUBLE, y DOUBLE)
> STORED BY 'org.apache.hadoop.hive.hbase.HBaseStorageHandler'
> WITH SERDEPROPERTIES (
> 'hbase.columns.mapping' = ':key,o:x,o:y',
> 'hbase.table.default.storage.type' = 'binary');
> SET hfile.family.path=/tmp/coords_hfiles/o;
> SET hive.hbase.generatehfiles=true;
> INSERT OVERWRITE TABLE coords_hbase
> SELECT id, decimalLongitude, decimalLatitude
> FROM source
> CLUSTER BY id;
> {code}
> Any advice greatly appreciated
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)