[
https://issues.apache.org/jira/browse/HIVE-17196?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16167348#comment-16167348
]
Daniel Dai commented on HIVE-17196:
-----------------------------------
Actually this won't result a filename conflict even without the filename
change. In event loading, Hive first use ReplCopyTask to copy the file to
staging dir of that event, and then use MoveTask to move to the destination.
During MoveTask, there is filename check, if the filename is the same, MoveTask
would generate a new filename. So the patch is not necessary.
> CM: ReplCopyTask should retain the original file names even if copied from CM
> path.
> -----------------------------------------------------------------------------------
>
> Key: HIVE-17196
> URL: https://issues.apache.org/jira/browse/HIVE-17196
> Project: Hive
> Issue Type: Sub-task
> Components: repl
> Affects Versions: 2.1.0
> Reporter: Sankar Hariappan
> Assignee: Daniel Dai
> Fix For: 3.0.0
>
> Attachments: HIVE-17196.1.patch
>
>
> Consider the below scenario,
> 1. Insert into table T1 with value(X).
> 2. Insert into table T1 with value(X).
> 3. Truncate the table T1.
> – This step backs up 2 files with same content to cmroot which ends up with
> one file in cmroot as checksum matches.
> 4. Incremental repl with above 3 operations.
> – In this step, both the insert event files will be read from cmroot where
> copy of one leads to overwrite the other one as the file name is same in cm
> path (checksum as file name).
> So, this leads to data loss and hence it is necessary to retain the original
> file names even if we copy from cm path.
--
This message was sent by Atlassian JIRA
(v6.4.14#64029)