[
https://issues.apache.org/jira/browse/HIVE-13539?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15308828#comment-15308828
]
Matt McCline commented on HIVE-13539:
-------------------------------------
Attached are the Q file and its output. I used the query text above as a
starting point. No failure on master... Perhaps older versions of software
are broken.
hbase-handler/src/test/queries/positive/hive_hfile_output_format.q
hbase-handler/src/test/results/positive/hive_hfile_output_format.q.out
> HiveHFileOutputFormat searching the wrong directory for HFiles
> --------------------------------------------------------------
>
> Key: HIVE-13539
> URL: https://issues.apache.org/jira/browse/HIVE-13539
> Project: Hive
> Issue Type: Bug
> Components: HBase Handler
> Affects Versions: 1.1.0
> Environment: Built into CDH 5.4.7
> Reporter: Tim Robertson
> Assignee: Matt McCline
> Priority: Blocker
> Attachments: hive_hfile_output_format.q,
> hive_hfile_output_format.q.out
>
>
> When creating HFiles for a bulkload in HBase I believe it is looking in the
> wrong directory to find the HFiles, resulting in the following exception:
> {code}
> Error: java.lang.RuntimeException: Hive Runtime Error while closing
> operators: java.io.IOException: Multiple family directories found in
> hdfs://c1n1.gbif.org:8020/user/hive/warehouse/tim.db/coords_hbase/_temporary/2/_temporary
> at
> org.apache.hadoop.hive.ql.exec.mr.ExecReducer.close(ExecReducer.java:295)
> at
> org.apache.hadoop.mapred.ReduceTask.runOldReducer(ReduceTask.java:453)
> at org.apache.hadoop.mapred.ReduceTask.run(ReduceTask.java:392)
> at org.apache.hadoop.mapred.YarnChild$2.run(YarnChild.java:163)
> at java.security.AccessController.doPrivileged(Native Method)
> at javax.security.auth.Subject.doAs(Subject.java:415)
> at
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1671)
> at org.apache.hadoop.mapred.YarnChild.main(YarnChild.java:158)
> Caused by: org.apache.hadoop.hive.ql.metadata.HiveException:
> java.io.IOException: Multiple family directories found in
> hdfs://c1n1.gbif.org:8020/user/hive/warehouse/tim.db/coords_hbase/_temporary/2/_temporary
> at
> org.apache.hadoop.hive.ql.exec.FileSinkOperator$FSPaths.closeWriters(FileSinkOperator.java:188)
> at
> org.apache.hadoop.hive.ql.exec.FileSinkOperator.closeOp(FileSinkOperator.java:958)
> at org.apache.hadoop.hive.ql.exec.Operator.close(Operator.java:598)
> at org.apache.hadoop.hive.ql.exec.Operator.close(Operator.java:610)
> at
> org.apache.hadoop.hive.ql.exec.mr.ExecReducer.close(ExecReducer.java:287)
> ... 7 more
> Caused by: java.io.IOException: Multiple family directories found in
> hdfs://c1n1.gbif.org:8020/user/hive/warehouse/tim.db/coords_hbase/_temporary/2/_temporary
> at
> org.apache.hadoop.hive.hbase.HiveHFileOutputFormat$1.close(HiveHFileOutputFormat.java:158)
> at
> org.apache.hadoop.hive.ql.exec.FileSinkOperator$FSPaths.closeWriters(FileSinkOperator.java:185)
> ... 11 more
> {code}
> The issue is that is looks for the HFiles in
> {{hdfs://c1n1.gbif.org:8020/user/hive/warehouse/tim.db/coords_hbase/_temporary/2/_temporary}}
> when I believe it should be looking in the task attempt subfolder, such as
> {{hdfs://c1n1.gbif.org:8020/user/hive/warehouse/tim.db/coords_hbase/_temporary/2/_temporary/attempt_1461004169450_0002_r_000000_1000}}.
> This can be reproduced in any HFile creation such as:
> {code:sql}
> CREATE TABLE coords_hbase(id INT, x DOUBLE, y DOUBLE)
> STORED BY 'org.apache.hadoop.hive.hbase.HBaseStorageHandler'
> WITH SERDEPROPERTIES (
> 'hbase.columns.mapping' = ':key,o:x,o:y',
> 'hbase.table.default.storage.type' = 'binary');
> SET hfile.family.path=/tmp/coords_hfiles/o;
> SET hive.hbase.generatehfiles=true;
> INSERT OVERWRITE TABLE coords_hbase
> SELECT id, decimalLongitude, decimalLatitude
> FROM source
> CLUSTER BY id;
> {code}
> Any advice greatly appreciated
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)