rdhabalia commented on a change in pull request #4400: PIP 37: [pulsar-client] 
support large message size
URL: https://github.com/apache/pulsar/pull/4400#discussion_r373739715
 
 

 ##########
 File path: 
pulsar-client-api/src/main/java/org/apache/pulsar/client/api/ProducerBuilder.java
 ##########
 @@ -295,6 +295,32 @@
      * @return the producer builder instance
      */
     ProducerBuilder<T> enableBatching(boolean enableBatching);
+    
+    /**
+     * If message size is higher than allowed max publish-payload size by 
broker then enableChunking helps producer to
+     * split message into multiple chunks and publish them to broker 
separately and in order. So, it allows client to
+     * successfully publish large size of messages in pulsar.
+     * 
+     * 
+     * This feature allows publisher to publish large size of message by 
splitting it to multiple chunks and let
+     * consumer stitch them together to form a original large published 
message. Therefore, it's necessary to configure
+     * recommended configuration at pulsar producer and consumer. 
Recommendation to use this feature:
+     * 
+     * <pre>
+     * 1. This feature is right now only supported by non-shared subscription 
and persistent-topic.
 
 Review comment:
   It makes harder to keep this responsibility in wrapper because it also 
requires to stitch messages by handling all failure scenarios. So, it's more 
logical to keep this responsibility with in client which makes it more simpler 
to handle them. [Other 
system](https://medium.com/workday-engineering/large-message-handling-with-kafka-chunking-vs-external-store-33b0fc4ccf14)
 also does similar thing at client side.
   Also, many customers do such chunking and stitching at application level for 
their pipeline to transport large messages without impacting cluster and other 
tenants. But they also have to take care message-sizing, chunking, dedup and 
failure scenario which make harder to solve this problem. Therefore, it makes 
user's life easier if this feature is part of the client lib.
   Right now, all the users need this feature for the streaming usecase which 
needs exclusive/failure subscription to consume data and push it to grid or 
perform Online analytical processing.. so, we have introduced this feature for 
one type of subscription for now and will see in future if we need it for other 
type as well.
   
   >  If a user is required to do these many things in order to enable this 
feature,
   
   User can use this feature with default values as well but here, we have 
added documentation to utilize this feature with better performance by tuning 
configuration. So, there shouldn't be complexity to support this feature and it 
should be straight forward to use it. In fact, this feature is more useful when 
pulsar is deployed on large scale system and user can deal with large size of 
messages without impacting other tenants.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
[email protected]


With regards,
Apache Git Services

Reply via email to