Karen Coppage created HIVE-23703: ------------------------------------ Summary: Major QB compaction with multiple FileSinkOperators results in data loss and one original file Key: HIVE-23703 URL: https://issues.apache.org/jira/browse/HIVE-23703 Project: Hive Issue Type: Bug Reporter: Karen Coppage Assignee: Karen Coppage
h4. Problems Example: {code:java} drop table if exists tbl2; create transactional table tbl2 (a int, b int) clustered by (a) into 4 buckets stored as ORC TBLPROPERTIES('transactional'='true','transactional_properties'='default'); insert into tbl2 values(1,2),(1,3),(1,4),(2,2),(2,3),(2,4); insert into tbl2 values(3,2),(3,3),(3,4),(4,2),(4,3),(4,4); insert into tbl2 values(5,2),(5,3),(5,4),(6,2),(6,3),(6,4);{code} E.g. in the example above, bucketId=0 when a=2 and a=6. 1. Data lossĀ In non-acid tables, an operator's temp files are named with their task id. Because of this snippet, temp files in the FileSinkOperator for compaction tables are identified by their bucket_id. {code:java} if (conf.isCompactionTable()) { fsp.initializeBucketPaths(filesIdx, AcidUtils.BUCKET_PREFIX + String.format(AcidUtils.BUCKET_DIGITS, bucketId), isNativeTable(), isSkewedStoredAsSubDirectories); } else { fsp.initializeBucketPaths(filesIdx, taskId, isNativeTable(), isSkewedStoredAsSubDirectories); } {code} So 2 temp files containing data with a=2 and a=6 will be named bucket_0 and not 000000_0 and 000000_1 as they would normally. In FileSinkOperator.commit, when data with a=2, filename: bucket_0 is moved from _task_tmp.-ext-10002 to _tmp.-ext-10002, it overwrites the files already there with a=6 data, because it too is named bucket_0. You can see in the logs: {code:java} WARN [LocalJobRunner Map Task Executor #0] exec.FileSinkOperator: Target path file:.../hive/ql/target/tmp/org.apache.hadoop.hive.ql.TestTxnNoBuckets-1591107230237/warehouse/testmajorcompaction/base_0000002_v0000013/.hive-staging_hive_2020-06-02_07-15-21_771_8551447285061957908-1/_tmp.-ext-10002/bucket_00000 with a size 610 exists. Trying to delete it. {code} 2. Results in one original file OrcFileMergeOperator merges the results of the FSOp into 1 file named 000000_0. h4. Fix 1. FSOp will store data as: taskid/bucketId. e.g. 0_0/bucket_0 2. OrcMergeFileOp, instead of merging a bunch of files into 1 file named 000000_0, will merge all files named bucket_0 into one file named bucket_0, and so on. 3. MoveTask will get rid of the taskId directories if present and only move the bucket files in them, in case OrcMergeFileOp is not run. -- This message was sent by Atlassian Jira (v8.3.4#803005)