[
https://issues.apache.org/jira/browse/KAFKA-12164?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Greg Harris resolved KAFKA-12164.
---------------------------------
Resolution: Invalid
This issue appears to deal only with a Connect plugin, which is not supported
by the Apache Kafka project. If/when an issue with the Connect framework is
implicated, a new ticket may be opened with details about that issue.
> ssue when kafka connect worker pod restart, during creation of nested
> partition directories in hdfs file system.
> ----------------------------------------------------------------------------------------------------------------
>
> Key: KAFKA-12164
> URL: https://issues.apache.org/jira/browse/KAFKA-12164
> Project: Kafka
> Issue Type: Bug
> Components: connect
> Reporter: kaushik srinivas
> Priority: Critical
>
> In our production labs, an issue is observed. Below is the sequence of the
> same.
> # hdfs connector is added to the connect worker.
> # hdfs connector is creating folders in hdfs /test1=1/test2=2/
> Based on the custom partitioner. Here test1 and test2 are two separate nested
> directories derived from multiple fields in the record using a custom
> partitioner.
> # Now kafka connect hdfs connector uses below function calls to create the
> directories in the hdfs file system.
> fs.mkdirs(new Path(filename));
> ref:
> [https://github.com/confluentinc/kafka-connect-hdfs/blob/master/src/main/java/io/confluent/connect/hdfs/storage/HdfsStorage.java]
> Now the important thing to note is that if mkdirs() is a non atomic operation
> (i.e can result in partial execution if interrupted)
> then suppose the first directory ie test1 is created and just before creation
> of test2 in hdfs happens if there is a restart to the connect worker pod.
> Then the hdfs file system will remain with partial folders created for
> partitions during the restart time frames.
> So we might have conditions in hdfs as below
> /test1=0/test2=0/
> /test1=1/
> /test1=2/test2=2
> /test1=3/test2=3
> So the second partition has a missing directory in it. And if hive
> integration is enabled, hive metastore exceptions will occur since there is a
> partition expected from hive table is missing for few partitions in hdfs.
> *This can occur to any connector with some ongoing non atomic operation and a
> restart is triggered to kafka connect worker pod. This will result in some
> partially completed states in the system and may cause issues for the
> connector to continue its operation*.
> *This is a very critical issue and needs some attention on ways for handling
> the same.*
--
This message was sent by Atlassian Jira
(v8.20.10#820010)