mahesh kumar behera created HIVE-20533:
------------------------------------------

             Summary: Adding notification is taking time in S3 replication
                 Key: HIVE-20533
                 URL: https://issues.apache.org/jira/browse/HIVE-20533
             Project: Hive
          Issue Type: Sub-task
          Components: repl
    Affects Versions: 4.0.0
            Reporter: mahesh kumar behera
            Assignee: mahesh kumar behera
             Fix For: 4.0.0


In replication load, both add partition and insert operations are handled 
through import. Import creates 3 major tasks. Copy, add partition and move. 
Copy does the copy of data from source location to staging directory. Then add 
partition (which runs in parallel to copy) creates the partition in meta store. 
Its a no op in case of insert and by the time this ddl task is executed for 
insert partition would be already present. The third operation is move. Which 
actually moves the file from staging directory to actual location. And then in 
case of insert it adds the insert event to notification table. It does this for 
add partition operation which is redundant as the event for add partition would 
have been written already by ddl task. With the optimization to copy directly 
to actual table location in S3, move task can be avoided for add partition 
operation replay and replay of insert need not create the add partition (ddl) 
task.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

Reply via email to