FileSinkOperator should remove duplicated files from the same task based on file sizes --------------------------------------------------------------------------------------
Key: HIVE-1492 URL: https://issues.apache.org/jira/browse/HIVE-1492 Project: Hadoop Hive Issue Type: Bug Reporter: Ning Zhang FileSinkOperator.jobClose() calls Utilities.removeTempOrDuplicateFiles() to retain only one file for each task. A task could produce multiple files due to failed attempts or speculative runs. The largest file should be retained rather than the first file for each task. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.