asafm commented on issue #16680:
URL: https://github.com/apache/pulsar/issues/16680#issuecomment-1202678136
Excellent explanation @AnonHxy - I finally understand the whole mechanism.
So in effect, you're adding another trigger condition to send the batch, on
top of the current max count, max size, and max delay: Once a message is
requested to be added to a batch of its properties (as defined in the
configuration) values are different from the records in the batch (i.e. 1st
record properties values) than you trigger the batch flush (i.e send and clear).
So the side effect of this behavior is that you can easily end up with tiny
batches, perhaps even 1 record per batch. There is a good chance once they turn
this feature on, they will lose all performance benefits of batching since the
batches will be very small. It completely depends on the distribution of
values. It might be a big trade-off you're asking from the user: You might
trade off the performance of write and perhaps read, for getting the ability to
have the server-side filter work for batches.
1. I would for sure document that trade-off very clearly in the PIP and in
the configuration page of the producer.
2. I would rephrase the explanation in this PIP to document the behavior:
> So the only solution is to change the way messages are batched and
collect the records into a batch only if they have the same values for the
properties configured to be extracted
-->
"So the only solution is to change the way messages are batched and
collect the records into a batch only if they have the same values for the
properties configured to be extracted. If a message is added to the producer
and the properties are not the same as the batched records, it will trigger a
send of this batch and start a new batch with that message. "
3. You need to emphasize throughout the document the trigger condition for
sending once a message with different properties is added.
4. Now that I understand, the name I suggested does not fit since
`batchGroupByProperties` makes you think you are grouping records in several
batches by those properties, which you don't since you have only 1 in-flight
batch. Maybe `restrictSameValuesInBatchProperties` ?
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]