cmccabe commented on PR #13280: URL: https://github.com/apache/kafka/pull/13280#issuecomment-1500536745
> i think you are referring to having to implement that if we don't go with the current PR implementation and instead go with something like what you proposed -- since the current PR wrapper implementations don't implement the Set/Map interfaces. Is that correct? In other words, I think you are arguing against your proposal -- I just want to be sure. So firstly, as per the earlier discussion in the PR, I think we have to implement `Set`, `Map`, and `TreeMap`. Using custom types just hurts programmer productivity and ability to understand the code too much. Obviously there are going to be some differences with the `java.util` versions, but hopefully those can be minimal, so people's intuitions about what is O(1), O(N), O(log N) can be correct, as well as people's understanding of what the code does. So the contest here is between 1. implementing your own Set, Map, TreeMap wrappers above the library code, or 2. implementing a helper class like I suggested above 3. just using pcollections directly, and adding a wrapper class only for the libraries that need it, like Vavr My main argument is that 1 is a lot of tricky code that we don't really need, and likely worse performance (it's hard to argue about it without benchmarks, of course). 3 will make it slightly harder to try different collections libraries. Although in theory we could ask people to use our own helper functions rather than PCollections-specific methods, to minimize the delta needed. Perhaps this is an avenue worth exploring. > I think it is important for people to be able to easily understand what performance they are going to get from a function. If that's not clear, and we end up giving O(1) performance if the downcast to a persistent collection can occur vs. O(N) performance if it cannot -- I think that would make it more likely for O(N) performance to occur inadvertently. Especially since people won't necessarily be able to know what type they have just by reading the immediate code -- if all they know is that have a java.util.Map then they might have to trace back to figure out where they actually got that, what the actual type is, and therefore what O() behavior they are going to get if they try to add something to it. There is a tension between what you are trying to do here (abstract over multiple libraries) and the idea that "people [need] to be able to easily understand what performance they are going to get from a function." If a new library we add later is somewhat better at get() and worse at put(), does that means that people using your wrappers can't "easily understand what performance they are going to get"? Arguably yes. So should we ditch the wrappers and use the libraries directly? Ultimately the test of performance is our JMH benchmarks, as well as the profiles we take in production. I believe we should rely on those, rather than assuming we can deduce everything about performance ahead of time... -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: jira-unsubscr...@kafka.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org