Eugene Koifman commented on HIVE-17138:

OrcRecordUpdater also has some inconsistent logic as to when it creates an 
empty file.  For "legacy" - always - for "default" - never.

Should add a switch just like FileSinkOperator that checks engine type (and 
some other prop)

> FileSinkOperator doesn't create empty files for acid path
> ---------------------------------------------------------
>                 Key: HIVE-17138
>                 URL: https://issues.apache.org/jira/browse/HIVE-17138
>             Project: Hive
>          Issue Type: Bug
>          Components: Transactions
>    Affects Versions: 2.2.0
>            Reporter: Eugene Koifman
>            Assignee: Eugene Koifman
> For bucketed tables, FileSinkOperator is expected (in some cases)  to produce 
> a specific number of files even if they are empty.
> FileSinkOperator.closeOp(boolean abort) has logic to create files even if 
> empty.
> This doesn't property work for Acid path.  For Insert, the 
> OrcRecordUpdater(s) is set up in createBucketForFileIdx() which creates the 
> actual bucketN file (as of HIVE-14007, it does it regardless of whether 
> RecordUpdater sees any rows).  This causes empty (i.e.ORC metadata only) 
> bucket files to be created for multiFileSpray=true if a particular 
> FileSinkOperator.process() sees at least 1 row.  For example,
> {noformat}
> create table fourbuckets (a int, b int) clustered by (a) into 4 buckets 
> stored as orc TBLPROPERTIES ('transactional'='true');
> insert into fourbuckets values(0,1),(1,1);
> with mapreduce.job.reduces = 1 or 2 
> {noformat}
> For Update/Delete path, OrcRecordWriter is created lazily when the 1st row 
> that needs to land there is seen.  Thus it never creates empty buckets no 
> mater what the value of _skipFiles_ in closeOp(boolean).
> Once Split Update does the split early (in operator pipeline) only the Insert 
> path will matter since base and delta are the only files split computation, 
> etc looks at.  delete_delta is only for Acid internals so there is never any 
> reason for create empty files there.
> Also make sure to close RecordUpdaters in FileSinkOperator.abortWriters()

This message was sent by Atlassian JIRA

Reply via email to