This isn't a bad idea, but I think it's mitigated by the fact that we clear the set instead of discarding it and creating a new one on flush. This means that it should quickly "settle in" to a size that accomodates the working set and stay there for the duration of the execution. So the cost of growing the set would only be incurred within the first few minutes or hours of execution.
I think... [ Full content available at: https://github.com/apache/kafka/pull/5724 ] This message was relayed via gitbox.apache.org for [email protected]
