[
https://issues.apache.org/jira/browse/HIVE-20532?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
mahesh kumar behera resolved HIVE-20532.
----------------------------------------
Resolution: Duplicate
> One of the task , either move or add partition can be avoided in repl load
> flow
> -------------------------------------------------------------------------------
>
> Key: HIVE-20532
> URL: https://issues.apache.org/jira/browse/HIVE-20532
> Project: Hive
> Issue Type: Sub-task
> Components: repl
> Affects Versions: 4.0.0
> Reporter: mahesh kumar behera
> Assignee: mahesh kumar behera
> Priority: Major
> Labels: pull-request-available
> Fix For: 4.0.0
>
>
> In replication load, both add partition and insert operations are handled
> through import. Import creates 3 major tasks. Copy, add partition and move.
> Copy does the copy of data from source location to staging directory. Then
> add partition (which runs in parallel to copy) creates the partition in meta
> store. Its a no op in case of insert and by the time this ddl task is
> executed for insert partition would be already present. The third operation
> is move. Which actually moves the file from staging directory to actual
> location. And then in case of insert it adds the insert event to notification
> table. It does this for add partition operation which is redundant as the
> event for add partition would have been written already by ddl task. With the
> optimization to copy directly to actual table location in S3, move task can
> be avoided for add partition operation replay and replay of insert need not
> create the add partition (ddl) task.
--
This message was sent by Atlassian JIRA
(v7.6.3#76005)