devinbost commented on issue #4012: Adding upsert functionality URL: https://github.com/apache/pulsar/pull/4012#issuecomment-481832662 @jerrypeng Thank you for your very detailed response. I appreciate your time and attention to this matter. Regarding: > Is there a reason why you can't just submit/update functions via the REST endpoints instead of using the pulsar-admin CLI from docker containers? Submitting/Updating functions by just making a HTTP REST call will be a lot faster . . . I appreciate your guidance. Based on advice from @merlimat earlier today, I am currently working on an implementation using the REST endpoints. Regarding: > Do you have 300 individual functions or is there a function with 300 instances or a group of functions that total 300 instances? There will be a huge submission time difference depending on which scenario. Submitting one function with 300 instances will take much less time that submitting 300 functions with one instance each. At the current moment, all of our functions are individual because they represent different use cases. However, we appreciate your advice about the performance improvement that we will get from deploying function instances, so we will examine ways that we can refactor to obtain those benefits. Regarding: > What do you mean by this? The cluster will be running as it should when submitting functions. I may have been unintentionally misleading, and I apologize for that. Please let me clarify. When I said: > Pulsar is in a broken state I didn't mean that the Pulsar cluster is not running. What I meant is that our end-to-end production message pipelines will be in a broken state. (i.e. Our customers will experience problems.) Consider a plumbing analogy. If you need to re-route pipes while water is flowing, if you can't do it extremely quickly, then water will end up leaking everywhere, and the people who are expecting water at a particular location will notice a loss of service. This doesn't mean that the water system is completely broken or that water is not flowing; however, it means that water is not reaching our customers. In our case, if we have a production data flow that is processing tens of thousands of messages per second, if we need to deploy updates to functions that are inter-dependent, then until all of the functions are deployed, some of the functions may introduce breaking changes that could cause data loss or could cause messages to fail to reach the final destination topic until all of the updated functions are deployed. Does this make more sense? Regarding: > I think functionality you are looking is bulk create, update, or upserts. You want to bring a cluster from a potentially unknown state into a known consistent state in regards to functions. I am I understanding you correctly? That is exactly right. I think you're right that we won't likely always need to update all 300 functions every time we deploy updates. However, we need to ensure that Pulsar can quickly and seamlessly match the expected state when we deploy updates. Regarding: > While we can add upserts and even bulk upserts. I would suggest you to try just creating/updating functions directly using the REST endpoint first to see if that is good enough. I will investigate your suggestions for implementing these changes for bulk actions. Thank you also for the guidance and example change to ComponentImpl.java for the Upsert functionality for this PR.
---------------------------------------------------------------- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: [email protected] With regards, Apache Git Services
