[ 
https://issues.apache.org/jira/browse/KAFKA-13231?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

David Jacot resolved KAFKA-13231.
---------------------------------
    Fix Version/s: 3.0.1
                   2.8.1
                   3.1.0
         Reviewer: Jason Gustafson
       Resolution: Fixed

> `TransactionalMessageCopier.start_node` should wait until the process if 
> fully started
> --------------------------------------------------------------------------------------
>
>                 Key: KAFKA-13231
>                 URL: https://issues.apache.org/jira/browse/KAFKA-13231
>             Project: Kafka
>          Issue Type: Bug
>            Reporter: David Jacot
>            Assignee: David Jacot
>            Priority: Major
>             Fix For: 3.1.0, 2.8.1, 3.0.1
>
>
> There is a subtile race condition in the code which bounces the transaction 
> message copier. As you can see in the log snippet above, it is possible that 
> the copier get bounced before it even starts. Note that the process is stated 
> by a separate thread. In this case, the method which stops the current 
> process miss it because the PID is not there yet. However, the stop hangs 
> because the thread does not stop as expected.
> It seems that we should wait until the process is fully started in `restart` 
> or `start_node` to avoid this issue.
> {noformat}
> [INFO  - 2021-08-24 07:48:25,882 - transactional_message_copier - _worker - 
> lineno:95]: copier-0: consumed 5250, remaining 28083
> [INFO  - 2021-08-24 07:48:26,121 - transactional_message_copier - _worker - 
> lineno:95]: copier-0: consumed 6000, remaining 27333
> [INFO  - 2021-08-24 07:48:26,379 - transactional_message_copier - _worker - 
> lineno:95]: copier-0: consumed 6750, remaining 26583
> [INFO  - 2021-08-24 07:48:26,536 - transactions_test - bounce_copiers - 
> lineno:144]: copier-0 - progress: 20.25020250202502
> [DEBUG - 2021-08-24 07:48:26,536 - remoteaccount - _log - lineno:160]: 
> ubuntu@worker22: Running ssh command: jps | grep -i 
> TransactionalMessageCopier | awk '{print $1}'
> [DEBUG - 2021-08-24 07:48:26,692 - remoteaccount - _log - lineno:160]: 
> ubuntu@worker22: Running ssh command: kill -9 1567
> [DEBUG - 2021-08-24 07:48:26,733 - remoteaccount - _log - lineno:160]: 
> ubuntu@worker22: Running ssh command: jps | grep -i 
> TransactionalMessageCopier | awk '{print $1}'
> [INFO  - 2021-08-24 07:48:27,021 - background_thread - start_node - 
> lineno:57]: Running TransactionalMessageCopier-0-139963594423096 node 1 on 
> worker22
> [DEBUG - 2021-08-24 07:48:27,021 - remoteaccount - _log - lineno:160]: 
> ubuntu@worker22: Running ssh command: mkdir -p 
> /mnt/transactional_message_copier
> [DEBUG - 2021-08-24 07:48:27,068 - remoteaccount - _log - lineno:160]: 
> ubuntu@worker22: Running ssh command: java -version
> [INFO  - 2021-08-24 07:48:27,163 - kafka - bootstrap_servers - lineno:2606]: 
> Bootstrap client port is: 9092
> [DEBUG - 2021-08-24 07:48:27,163 - transactional_message_copier - _worker - 
> lineno:85]: TransactionalMessageCopier 1 command: export 
> LOG_DIR=/mnt/transactional_message_copier/logs; export KAFKA_OPTS=; export 
> KAFKA_LOG4J_OPTS="-Dlog4j.configuration=file:/mnt/transactional_message_copier/tools-log4j.properties";
>  /opt/kafka-dev/bin/kafka-run-class.sh 
> org.apache.kafka.tools.TransactionalMessageCopier --broker-list 
> worker13:9092,worker9:9092,worker20:9092 --transactional-id copier-0 
> --consumer-group transactions-test-consumer-group --input-topic input-topic 
> --output-topic output-topic --input-partition 0 --transaction-size 750 
> --transaction-timeout 40000 --enable-random-aborts 2>> 
> /mnt/transactional_message_copier/transactional_message_copier.stderr | tee 
> -a /mnt/transactional_message_copier/transactional_message_copier.stdout &
> [DEBUG - 2021-08-24 07:48:27,163 - remoteaccount - _log - lineno:160]: 
> ubuntu@worker22: Running ssh command: export 
> LOG_DIR=/mnt/transactional_message_copier/logs; export KAFKA_OPTS=; export 
> KAFKA_LOG4J_OPTS="-Dlog4j.configuration=file:/mnt/transactional_message_copier/tools-log4j.properties";
>  /opt/kafka-dev/bin/kafka-run-class.sh 
> org.apache.kafka.tools.TransactionalMessageCopier --broker-list 
> worker13:9092,worker9:9092,worker20:9092 --transactional-id copier-0 
> --consumer-group transactions-test-consumer-group --input-topic input-topic 
> --output-topic output-topic --input-partition 0 --transaction-size 750 
> --transaction-timeout 40000 --enable-random-aborts 2>> 
> /mnt/transactional_message_copier/transactional_message_copier.stderr | tee 
> -a /mnt/transactional_message_copier/transactional_message_copier.stdout &
> [INFO  - 2021-08-24 07:49:08,280 - transactional_message_copier - _worker - 
> lineno:95]: copier-0: consumed 0, remaining 26583
> [INFO  - 2021-08-24 07:49:08,763 - transactional_message_copier - _worker - 
> lineno:95]: copier-0: consumed 750, remaining 25833
> [INFO  - 2021-08-24 07:49:09,034 - transactional_message_copier - _worker - 
> lineno:95]: copier-0: consumed 1500, remaining 25083
> [INFO  - 2021-08-24 07:49:09,279 - transactional_message_copier - _worker - 
> lineno:95]: copier-0: consumed 1500, remaining 25083
> [INFO  - 2021-08-24 07:49:09,553 - transactional_message_copier - _worker - 
> lineno:95]: copier-0: consumed 2250, remaining 24333
> [INFO  - 2021-08-24 07:49:09,793 - transactional_message_copier - _worker - 
> lineno:95]: copier-0: consumed 2250, remaining 24333
> [INFO  - 2021-08-24 07:49:10,059 - transactional_message_copier - _worker - 
> lineno:95]: copier-0: consumed 3000, remaining 23583
> [INFO  - 2021-08-24 07:49:10,312 - transactional_message_copier - _worker - 
> lineno:95]: copier-0: consumed 3750, remaining 22833
> [INFO  - 2021-08-24 07:49:10,661 - transactional_message_copier - _worker - 
> lineno:95]: copier-0: consumed 4500, remaining 22083
> [INFO  - 2021-08-24 07:49:11,056 - transactional_message_copier - _worker - 
> lineno:95]: copier-0: consumed 5250, remaining 21333
> [INFO  - 2021-08-24 07:49:11,402 - transactional_message_copier - _worker - 
> lineno:95]: copier-0: consumed 5250, remaining 21333
> [INFO  - 2021-08-24 07:49:11,698 - transactional_message_copier - _worker - 
> lineno:95]: copier-0: consumed 6000, remaining 20583
> [INFO  - 2021-08-24 07:49:11,876 - transactions_test - bounce_copiers - 
> lineno:144]: copier-0 - progress: 22.570815934996048
> [DEBUG - 2021-08-24 07:49:11,876 - remoteaccount - _log - lineno:160]: 
> ubuntu@worker22: Running ssh command: jps | grep -i 
> TransactionalMessageCopier | awk '{print $1}'
> [INFO  - 2021-08-24 07:49:11,882 - transactional_message_copier - _worker - 
> lineno:95]: copier-0: consumed 6000, remaining 20583
> [DEBUG - 2021-08-24 07:49:12,089 - remoteaccount - _log - lineno:160]: 
> ubuntu@worker22: Running ssh command: kill -9 2195
> [DEBUG - 2021-08-24 07:49:12,129 - remoteaccount - _log - lineno:160]: 
> ubuntu@worker22: Running ssh command: jps | grep -i 
> TransactionalMessageCopier | awk '{print $1}'
> [INFO  - 2021-08-24 07:49:12,425 - background_thread - start_node - 
> lineno:57]: Running TransactionalMessageCopier-0-139963594423096 node 1 on 
> worker22
> [DEBUG - 2021-08-24 07:49:12,426 - remoteaccount - _log - lineno:160]: 
> ubuntu@worker22: Running ssh command: mkdir -p 
> /mnt/transactional_message_copier
> [DEBUG - 2021-08-24 07:49:12,472 - remoteaccount - _log - lineno:160]: 
> ubuntu@worker22: Running ssh command: java -version
> [INFO  - 2021-08-24 07:49:12,526 - transactions_test - bounce_copiers - 
> lineno:144]: copier-0 - progress: 22.570815934996048
> [DEBUG - 2021-08-24 07:49:12,526 - remoteaccount - _log - lineno:160]: 
> ubuntu@worker22: Running ssh command: jps | grep -i 
> TransactionalMessageCopier | awk '{print $1}'
> [INFO  - 2021-08-24 07:49:12,567 - kafka - bootstrap_servers - lineno:2606]: 
> Bootstrap client port is: 9092
> [DEBUG - 2021-08-24 07:49:12,567 - transactional_message_copier - _worker - 
> lineno:85]: TransactionalMessageCopier 1 command: export 
> LOG_DIR=/mnt/transactional_message_copier/logs; export KAFKA_OPTS=; export 
> KAFKA_LOG4J_OPTS="-Dlog4j.configuration=file:/mnt/transactional_message_copier/tools-log4j.properties";
>  /opt/kafka-dev/bin/kafka-run-class.sh 
> org.apache.kafka.tools.TransactionalMessageCopier --broker-list 
> worker13:9092,worker9:9092,worker20:9092 --transactional-id copier-0 
> --consumer-group transactions-test-consumer-group --input-topic input-topic 
> --output-topic output-topic --input-partition 0 --transaction-size 750 
> --transaction-timeout 40000 --enable-random-aborts 2>> 
> /mnt/transactional_message_copier/transactional_message_copier.stderr | tee 
> -a /mnt/transactional_message_copier/transactional_message_copier.stdout &
> [DEBUG - 2021-08-24 07:49:12,567 - remoteaccount - _log - lineno:160]: 
> ubuntu@worker22: Running ssh command: export 
> LOG_DIR=/mnt/transactional_message_copier/logs; export KAFKA_OPTS=; export 
> KAFKA_LOG4J_OPTS="-Dlog4j.configuration=file:/mnt/transactional_message_copier/tools-log4j.properties";
>  /opt/kafka-dev/bin/kafka-run-class.sh 
> org.apache.kafka.tools.TransactionalMessageCopier --broker-list 
> worker13:9092,worker9:9092,worker20:9092 --transactional-id copier-0 
> --consumer-group transactions-test-consumer-group --input-topic input-topic 
> --output-topic output-topic --input-partition 0 --transaction-size 750 
> --transaction-timeout 40000 --enable-random-aborts 2>> 
> /mnt/transactional_message_copier/transactional_message_copier.stderr | tee 
> -a /mnt/transactional_message_copier/transactional_message_copier.stdout &
> [INFO  - 2021-08-24 07:49:58,379 - transactional_message_copier - _worker - 
> lineno:95]: copier-0: consumed 0, remaining 20583
> [INFO  - 2021-08-24 07:49:58,842 - transactional_message_copier - _worker - 
> lineno:95]: copier-0: consumed 0, remaining 20583
> [INFO  - 2021-08-24 07:49:59,112 - transactional_message_copier - _worker - 
> lineno:95]: copier-0: consumed 0, remaining 20583
> {noformat}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

Reply via email to