[ 
https://issues.apache.org/jira/browse/KAFKA-12164?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17262536#comment-17262536
 ] 

kaushik srinivas commented on KAFKA-12164:
------------------------------------------

[~ijuma]

Need your inputs.

> ssue when kafka connect worker pod restart, during creation of nested 
> partition directories in hdfs file system.
> ----------------------------------------------------------------------------------------------------------------
>
>                 Key: KAFKA-12164
>                 URL: https://issues.apache.org/jira/browse/KAFKA-12164
>             Project: Kafka
>          Issue Type: Bug
>          Components: KafkaConnect
>            Reporter: kaushik srinivas
>            Priority: Critical
>
> In our production labs, an issue is observed. Below is the sequence of the 
> same.
>  # hdfs connector is added to the connect worker.
>  # hdfs connector is creating folders in hdfs /test1=1/test2=2/
> Based on the custom partitioner. Here test1 and test2 are two separate nested 
> directories derived from multiple fields in the record using a custom 
> partitioner.
>  # Now kafka connect hdfs connector uses below function calls to create the 
> directories in the hdfs file system.
> fs.mkdirs(new Path(filename));
> ref: 
> [https://github.com/confluentinc/kafka-connect-hdfs/blob/master/src/main/java/io/confluent/connect/hdfs/storage/HdfsStorage.java]
> Now the important thing to note is that if mkdirs() is a non atomic operation 
> (i.e can result in partial execution if interrupted)
> then suppose the first directory ie test1 is created and just before creation 
> of test2 in hdfs happens if there is a restart to the connect worker pod. 
> Then the hdfs file system will remain with partial folders created for 
> partitions during the restart time frames.
> So we might have conditions in hdfs as below
> /test1=0/test2=0/
> /test1=1/
> /test1=2/test2=2
> /test1=3/test2=3
> So the second partition has a missing directory in it. And if hive 
> integration is enabled, hive metastore exceptions will occur since there is a 
> partition expected from hive table is missing for few partitions in hdfs.
> *This can occur to any connector with some ongoing non atomic operation and a 
> restart is triggered to kafka connect worker pod. This will result in some 
> partially completed states in the system and may cause issues for the 
> connector to continue its operation*.
> *This is a very critical issue and needs some attention on ways for handling 
> the same.*



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

Reply via email to