asafm commented on issue #16680:
URL: https://github.com/apache/pulsar/issues/16680#issuecomment-1202678136

   Excellent explanation @AnonHxy - I finally understand the whole mechanism.
   
   So in effect, you're adding another trigger condition to send the batch, on 
top of the current max count, max size, and max delay: Once a message is 
requested to be added to a batch of its properties (as defined in the 
configuration) values are different from the records in the batch (i.e. 1st 
record properties values) than you trigger the batch flush (i.e send and clear).
   
   So the side effect of this behavior is that you can easily end up with tiny 
batches, perhaps even 1 record per batch. There is a good chance once they turn 
this feature on, they will lose all performance benefits of batching since the 
batches will be very small. It completely depends on the distribution of 
values. It might be a big trade-off you're asking from the user: You might 
trade off the performance of write and perhaps read, for getting the ability to 
have the server-side filter work for batches. 
   
   1. I would for sure document that trade-off very clearly in the PIP and in 
the configuration page of the producer.
   2. I would rephrase the explanation in this PIP to document the behavior: 
   
      > So the only solution is to change the way messages are batched and 
collect the records into a batch only if they have the same values for the 
properties configured to be extracted
    
      --> 
   
      "So the only solution is to change the way messages are batched and 
collect the records into a batch only if they have the same values for the 
properties configured to be extracted. If a message is added to the producer 
and the properties are not the same as the batched records, it will trigger a 
send of this batch and start a new batch with that message. "
    
   3. You need to emphasize throughout the document the trigger condition for 
sending once a message with different properties is added.
   4. Now that I understand, the name I suggested does not fit since 
`batchGroupByProperties` makes you think you are grouping records in several 
batches by those properties, which you don't since you have only 1 in-flight 
batch. Maybe `restrictSameValuesInBatchProperties` ?
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to