[ 
https://issues.apache.org/jira/browse/HIVE-21197?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

mahesh kumar behera updated HIVE-21197:
---------------------------------------
    Status: Patch Available  (was: Open)

> Hive Replication can add duplicate data during migration to a target with 
> hive.strict.managed.tables enabled
> ------------------------------------------------------------------------------------------------------------
>
>                 Key: HIVE-21197
>                 URL: https://issues.apache.org/jira/browse/HIVE-21197
>             Project: Hive
>          Issue Type: Task
>          Components: repl
>            Reporter: mahesh kumar behera
>            Assignee: mahesh kumar behera
>            Priority: Major
>         Attachments: HIVE-21197.01.patch, HIVE-21197.02.patch
>
>
> During bootstrap phase it may happen that the files copied to target are 
> created by events which are not part of the bootstrap. This is because of the 
> fact that, bootstrap first gets the last event id and then the file list. 
> During this period if some event are added, then bootstrap will include files 
> created by these events also.The same files will be copied again during the 
> first incremental replication just after the bootstrap. In normal scenario, 
> the duplicate copy does not cause any issue as hive allows the use of target 
> database only after the first incremental. But in case of migration, the file 
> at source and target are copied to different location (based on the write id 
> at target) and thus this may lead to duplicate data at target. This can be 
> avoided by having at check at load time for duplicate file. This check can be 
> done only for the first incremental and the search can be done in the 
> bootstrap directory (with write id 1). if the file is already present then 
> just ignore the copy.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

Reply via email to