fantapsody commented on issue #6173: Log compaction fails due to timeout URL: https://github.com/apache/pulsar/issues/6173#issuecomment-582480075 Thanks for all the details @EugenDueck provided, I can reliably reproduce the problem as described now. I found a few problems for log compaction here: 1. Fail for empty topics. This can be fixed by an additional check before compaction. 2. Unable to finish the compaction if the value of the last message is empty when the compaction is triggered. This can be fixed by adding the missing check in the phase 2 of the compaction. Please notice that a message with an empty value indicates a [deletion](https://pulsar.apache.org/docs/en/concepts-topic-compaction/#how-topic-compaction-works) @pienio7 3. Fail for massive writes. This is a bit complicated, as the raw reader for compaction stops reading new messages after the first batch (100 entries) of raw messages(which may contain many more sub-messages), and increasing the timeout duration doesn't help. It seems there is something wrong with flow control here, and I have to do more test and analysis. https://github.com/apache/pulsar/blob/13d8ecd20a3c6795405fbf5946c1907e9c90dd91/pulsar-broker/src/main/java/org/apache/pulsar/compaction/TwoPhaseCompactor.java#L106 For problem 3, I got the stats of the topic when the compaction stucks, and some metrics like `availablePermits`, `lastAckedTimestamp` seems to be anomalous here. Maybe someone could provide some insight or advice based on these metrics and the problems described? @ivankelly @sijie ```json { "msgRateIn" : 311362.5386883373, "msgThroughputIn" : 7316116.5272862585, "msgRateOut" : 0.0, "msgThroughputOut" : 0.0, "averageMsgSize" : 23.497099420201696, "storageSize" : 238074383, "backlogSize" : 238074383, "publishers" : [ { "msgRateIn" : 311362.5386883373, "msgThroughputIn" : 7316116.5272862585, "averageMsgSize" : 23.0, "producerId" : 0, "metadata" : { }, "address" : "/127.0.0.1:57444", "producerName" : "standalone-1-3", "connectedSince" : "2020-02-05T22:29:10.529+08:00", "clientVersion" : "2.5.0" } ], "subscriptions" : { "__compaction" : { "msgRateOut" : 0.0, "msgThroughputOut" : 0.0, "msgRateRedeliver" : 0.0, "msgBacklog" : 11181, "blockedSubscriptionOnUnackedMsgs" : false, "msgDelayed" : 0, "unackedMessages" : 40122, "type" : "Exclusive", "activeConsumerName" : "3e96d", "msgRateExpired" : 0.0, "lastExpireTimestamp" : 0, "lastConsumedFlowTimestamp" : 1580912972315, "lastConsumedTimestamp" : 1580912972386, "lastAckedTimestamp" : 0, "consumers" : [ { "msgRateOut" : 0.0, "msgThroughputOut" : 0.0, "msgRateRedeliver" : 0.0, "consumerName" : "3e96d", "availablePermits" : -39122, "unackedMessages" : 40122, "blockedConsumerOnUnackedMsgs" : false, "lastAckedTimestamp" : 0, "lastConsumedTimestamp" : 1580912972386, "metadata" : { }, "address" : "/127.0.0.1:56969", "connectedSince" : "2020-02-05T22:29:32.314+08:00", "clientVersion" : "2.6.0-SNAPSHOT" } ], "isReplicated" : false } }, "replication" : { }, "deduplicationStatus" : "Disabled", "bytesInCounter" : 238088065, "msgInCounter" : 9985337 } ```
---------------------------------------------------------------- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: [email protected] With regards, Apache Git Services
