chia7712 commented on a change in pull request #9026:
URL: https://github.com/apache/kafka/pull/9026#discussion_r455500775



##########
File path: tests/kafkatest/tests/core/transactions_test.py
##########
@@ -47,11 +47,15 @@ def __init__(self, test_context):
         self.num_output_partitions = 3
         self.num_seed_messages = 100000
         self.transaction_size = 750
-        # The timeout of transaction should be lower than the timeout of 
verification. The transactional message sent by
-        # client may be not correctly completed in hard_bounce mode. The 
pending transaction (unstable offset) stored by
-        # broker obstructs TransactionMessageCopier from getting offset of 
partition which is used to calculate
-        # remaining messages after restarting.
-        self.transaction_timeout = 5000
+
+        # The transaction timeout should be lower than the progress timeout, 
but at
+        # least as high as the request timeout (which is 30s by default). When 
the
+        # client is hard-bounced, progress may depend on the previous 
transaction
+        # being aborted. When the broker is hard-bounced, we may have to wait 
as
+        # long as the request timeout to get a `Produce` response and we do not
+        # want the coordinator timing out the transaction.
+        self.transaction_timeout = 40000
+        self.progress_timeout_sec = 60

Review comment:
       > I decided to hold off on this, but I can reconsider it if you think 
it's worthwhile.
   
   it is fine to me as your approach is more simple :)
   
   > What might be preferable is to provide a way to override the request 
timeout in TransactionalMessageCopier so that we can use lower values in all 
cases.
   
   the root cause I observed is different to 
https://issues.apache.org/jira/browse/KAFKA-9802. On my local, 
```TransactionalMessageCopier``` fails due to ProducerFencedException which is 
caused by that broker increases the producer epoch when aborting transaction. 
   ```
                       } catch (ProducerFencedException | 
OutOfOrderSequenceException e) {
                           // We cannot recover from these errors, so just 
rethrow them and let the process fail
                           throw e;
                       } catch (KafkaException e) {
                           producer.abortTransaction();
                           resetToLastCommittedPositions(consumer);
                       }
   ```
   
   Perhaps we should make ```TransactionalMessageCopier``` recoverable from 
transaction timeout before 
[KIP-558](https://cwiki.apache.org/confluence/display/KAFKA/KIP-588%3A+Allow+producers+to+recover+gracefully+from+transaction+timeouts)
 is addressed.
   
   BTW, [group_mode_transactions_test.py]( 
https://github.com/apache/kafka/blob/trunk/tests/kafkatest/tests/core/group_mode_transactions_test.py#L256)
 has similar issue. Could you fix it also? Or we can apply your approach to fix 
```group_mode_transactions_test.py``` in another PR.
   




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Reply via email to