[ 
https://issues.apache.org/jira/browse/HIVE-23816?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

rameshkrishnan muthusamy updated HIVE-23816:
--------------------------------------------
    Description: 
During the process of partition registration via thrift api we are noticing 
that the HDFS file path associated is being deleted even though the path was 
not created by the same process. 

This results in loss of data in the dir path.  In the below example there are 3 
threads that is trying to create a dir and only one of succeeds in registering 
a partition , resulting the other 2 threads deleting the directory created and 
registered by the original thread. 


hadoop-cmf-hive-HIVEMETASTORE-******.41:2020-07-02 08:50:31,307 INFO 
org.apache.hadoop.hive.common.FileUtils: [pool-5-thread-379217]: Creating 
directory if it doesn't exist: hdfs://test_path/dt=2020-07-02/hhmm-0850
hadoop-cmf-hive-HIVEMETASTORE-******.41:2020-07-02 08:50:31,308 INFO 
org.apache.hadoop.hive.common.FileUtils: [pool-5-thread-386717]: Creating 
directory if it doesn't exist: hdfs://test_path/dt=2020-07-02/hhmm-0850
hadoop-cmf-hive-HIVEMETASTORE-******.41:2020-07-02 08:50:31,308 INFO 
org.apache.hadoop.hive.common.FileUtils: [pool-5-thread-379074]: Creating 
directory if it doesn't exist: hdfs://test_path/dt=2020-07-02/hhmm-0850
hadoop-cmf-hive-HIVEMETASTORE-******.41:2020-07-02 08:50:31,314 INFO 
hive.metastore.hivemetastoressimpl: [pool-5-thread-386717]: deleting 
hdfs://test_path/dt=2020-07-02/hhmm-0850
hadoop-cmf-hive-HIVEMETASTORE-******.41:2020-07-02 08:50:31,315 INFO 
hive.metastore.hivemetastoressimpl: [pool-5-thread-379217]: deleting 
hdfs://test_path/dt=2020-07-02/hhmm-0850
hadoop-cmf-hive-HIVEMETASTORE-******.41:2020-07-02 08:50:31,321 INFO 
org.apache.hadoop.fs.TrashPolicyDefault: [pool-5-thread-386717]: Moved: 
'hdfs://test_path/dt=2020-07-02/hhmm-0850' to trash at: 
hdfs://user/test/.Trash/Current/test/dt=2020-07-02/hhmm=0850
hadoop-cmf-hive-HIVEMETASTORE-******.41:2020-07-02 08:50:31,321 INFO 
hive.metastore.hivemetastoressimpl: [pool-5-thread-386717]: Moved to trash: 
hdfs://test_path/dt=2020-07-02/hhmm-0850
hadoop-cmf-hive-HIVEMETASTORE-******.41:2020-07-02 08:50:31,323 ERROR hive.log: 
[pool-5-thread-379217]: Got exception: java.io.IOException Failed to move to 
trash: hdfs://test_path/dt=2020-07-02/hhmm-0850
hadoop-cmf-hive-HIVEMETASTORE-******.41:java.io.IOException: Failed to move to 
trash: hdfs://test_path/dt=2020-07-02/hhmm-0850
hadoop-cmf-hive-HIVEMETASTORE-******.41:2020-07-02 08:50:31,328 ERROR 
org.apache.hadoop.hive.metastore.RetryingHMSHandler: [pool-5-thread-379217]: 
MetaException(message:Got exception: java.io.IOException Failed to move to 
trash: hdfs://test_path/dt=2020-07-02/hhmm-0850)

 

  was:
During the process of partition registration via thrift api we are noticing 
that the HDFS file path associated is being deleted even though the path was 
not created by the same process. 

This results in loss of data in the dir path. 

 


>  Concurrent access of metastore dynamic partition registration API resulting 
> in data loss due to HDFS dir deletion 
> -------------------------------------------------------------------------------------------------------------------
>
>                 Key: HIVE-23816
>                 URL: https://issues.apache.org/jira/browse/HIVE-23816
>             Project: Hive
>          Issue Type: Bug
>          Components: HiveServer2
>            Reporter: rameshkrishnan muthusamy
>            Assignee: rameshkrishnan muthusamy
>            Priority: Major
>
> During the process of partition registration via thrift api we are noticing 
> that the HDFS file path associated is being deleted even though the path was 
> not created by the same process. 
> This results in loss of data in the dir path.  In the below example there are 
> 3 threads that is trying to create a dir and only one of succeeds in 
> registering a partition , resulting the other 2 threads deleting the 
> directory created and registered by the original thread. 
> hadoop-cmf-hive-HIVEMETASTORE-******.41:2020-07-02 08:50:31,307 INFO 
> org.apache.hadoop.hive.common.FileUtils: [pool-5-thread-379217]: Creating 
> directory if it doesn't exist: hdfs://test_path/dt=2020-07-02/hhmm-0850
> hadoop-cmf-hive-HIVEMETASTORE-******.41:2020-07-02 08:50:31,308 INFO 
> org.apache.hadoop.hive.common.FileUtils: [pool-5-thread-386717]: Creating 
> directory if it doesn't exist: hdfs://test_path/dt=2020-07-02/hhmm-0850
> hadoop-cmf-hive-HIVEMETASTORE-******.41:2020-07-02 08:50:31,308 INFO 
> org.apache.hadoop.hive.common.FileUtils: [pool-5-thread-379074]: Creating 
> directory if it doesn't exist: hdfs://test_path/dt=2020-07-02/hhmm-0850
> hadoop-cmf-hive-HIVEMETASTORE-******.41:2020-07-02 08:50:31,314 INFO 
> hive.metastore.hivemetastoressimpl: [pool-5-thread-386717]: deleting 
> hdfs://test_path/dt=2020-07-02/hhmm-0850
> hadoop-cmf-hive-HIVEMETASTORE-******.41:2020-07-02 08:50:31,315 INFO 
> hive.metastore.hivemetastoressimpl: [pool-5-thread-379217]: deleting 
> hdfs://test_path/dt=2020-07-02/hhmm-0850
> hadoop-cmf-hive-HIVEMETASTORE-******.41:2020-07-02 08:50:31,321 INFO 
> org.apache.hadoop.fs.TrashPolicyDefault: [pool-5-thread-386717]: Moved: 
> 'hdfs://test_path/dt=2020-07-02/hhmm-0850' to trash at: 
> hdfs://user/test/.Trash/Current/test/dt=2020-07-02/hhmm=0850
> hadoop-cmf-hive-HIVEMETASTORE-******.41:2020-07-02 08:50:31,321 INFO 
> hive.metastore.hivemetastoressimpl: [pool-5-thread-386717]: Moved to trash: 
> hdfs://test_path/dt=2020-07-02/hhmm-0850
> hadoop-cmf-hive-HIVEMETASTORE-******.41:2020-07-02 08:50:31,323 ERROR 
> hive.log: [pool-5-thread-379217]: Got exception: java.io.IOException Failed 
> to move to trash: hdfs://test_path/dt=2020-07-02/hhmm-0850
> hadoop-cmf-hive-HIVEMETASTORE-******.41:java.io.IOException: Failed to move 
> to trash: hdfs://test_path/dt=2020-07-02/hhmm-0850
> hadoop-cmf-hive-HIVEMETASTORE-******.41:2020-07-02 08:50:31,328 ERROR 
> org.apache.hadoop.hive.metastore.RetryingHMSHandler: [pool-5-thread-379217]: 
> MetaException(message:Got exception: java.io.IOException Failed to move to 
> trash: hdfs://test_path/dt=2020-07-02/hhmm-0850)
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

Reply via email to