rdhabalia commented on a change in pull request #4400: PIP 37: [pulsar-client]
support large message size
URL: https://github.com/apache/pulsar/pull/4400#discussion_r373739715
##########
File path:
pulsar-client-api/src/main/java/org/apache/pulsar/client/api/ProducerBuilder.java
##########
@@ -295,6 +295,32 @@
* @return the producer builder instance
*/
ProducerBuilder<T> enableBatching(boolean enableBatching);
+
+ /**
+ * If message size is higher than allowed max publish-payload size by
broker then enableChunking helps producer to
+ * split message into multiple chunks and publish them to broker
separately and in order. So, it allows client to
+ * successfully publish large size of messages in pulsar.
+ *
+ *
+ * This feature allows publisher to publish large size of message by
splitting it to multiple chunks and let
+ * consumer stitch them together to form a original large published
message. Therefore, it's necessary to configure
+ * recommended configuration at pulsar producer and consumer.
Recommendation to use this feature:
+ *
+ * <pre>
+ * 1. This feature is right now only supported by non-shared subscription
and persistent-topic.
Review comment:
It makes harder to keep this responsibility in wrapper because it also
requires to stitch messages by handling all failure scenarios. So, it's more
logical to keep this responsibility with in client which makes it more simpler
to handle them. [Other
system](https://medium.com/workday-engineering/large-message-handling-with-kafka-chunking-vs-external-store-33b0fc4ccf14)
also does similar thing at client side.
Also, many customers do such chunking and stitching at application level for
their pipeline to transport large messages without impacting cluster and other
tenants. But they also have to take care message-sizing, chunking, dedup and
failure scenario which make harder to solve this problem. Therefore, it makes
user's life easier if this feature is part of the client lib.
Right now, all the users need this feature for the streaming usecase which
needs exclusive/failure subscription to consume data and push it to grid or
perform Online analytical processing.. so, we have introduced this feature for
one type of subscription for now and will see in future if we need it for other
type as well.
> If a user is required to do these many things in order to enable this
feature,
User can use this feature with default values as well but here, we have
added documentation to utilize this feature with better performance by tuning
configuration. So, there shouldn't be complexity to support this feature and it
should be straight forward to use it. In fact, this feature is more useful when
pulsar is deployed on large scale system and user can deal with large size of
messages without impacting other tenants.
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
[email protected]
With regards,
Apache Git Services