[
https://issues.apache.org/jira/browse/HIVE-17138?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Eugene Koifman updated HIVE-17138:
----------------------------------
Summary: FileSinkOperator/Compactor doesn't create empty files for acid
path (was: FileSinkOperator doesn't create empty files for acid path)
> FileSinkOperator/Compactor doesn't create empty files for acid path
> -------------------------------------------------------------------
>
> Key: HIVE-17138
> URL: https://issues.apache.org/jira/browse/HIVE-17138
> Project: Hive
> Issue Type: Bug
> Components: Transactions
> Affects Versions: 2.2.0
> Reporter: Eugene Koifman
> Assignee: Eugene Koifman
>
> For bucketed tables, FileSinkOperator is expected (in some cases) to produce
> a specific number of files even if they are empty.
> FileSinkOperator.closeOp(boolean abort) has logic to create files even if
> empty.
> This doesn't property work for Acid path. For Insert, the
> OrcRecordUpdater(s) is set up in createBucketForFileIdx() which creates the
> actual bucketN file (as of HIVE-14007, it does it regardless of whether
> RecordUpdater sees any rows). This causes empty (i.e.ORC metadata only)
> bucket files to be created for multiFileSpray=true if a particular
> FileSinkOperator.process() sees at least 1 row. For example,
> {noformat}
> create table fourbuckets (a int, b int) clustered by (a) into 4 buckets
> stored as orc TBLPROPERTIES ('transactional'='true');
> insert into fourbuckets values(0,1),(1,1);
> with mapreduce.job.reduces = 1 or 2
> {noformat}
> For Update/Delete path, OrcRecordWriter is created lazily when the 1st row
> that needs to land there is seen. Thus it never creates empty buckets no
> mater what the value of _skipFiles_ in closeOp(boolean).
> Once Split Update does the split early (in operator pipeline) only the Insert
> path will matter since base and delta are the only files split computation,
> etc looks at. delete_delta is only for Acid internals so there is never any
> reason for create empty files there.
> Also make sure to close RecordUpdaters in FileSinkOperator.abortWriters()
--
This message was sent by Atlassian JIRA
(v6.4.14#64029)