jiajunwang commented on issue #362: The WAGED rebalancer cluster model implementation URL: https://github.com/apache/helix/pull/362#issuecomment-517151234 @i3wangyi Thanks for the comments. For the general comments, I'm trying to answer here. > You could just have a blank method without detail implementations since the implementation details are very likely to change as we proceed. It's the same principle of interface oriented programming as well. I agree with the idea, but since we have the POC code already, I will just take this low-hanging fruit. With the latest update, all the essential parts are ready. > More and detailed comments are more than welcome, at least we need to document better than our current algorithm does. Agreed. Especially the algorithm and rebalancer main logic. When we working on those parts later, let's firstly try to keep it simple. Then adding comments whenever necessary. > If you agree with item1, let's not worry about the unit test. Things will change. Unit test is actually very important. Even with the interface only, I would prefer to have the test frame completed. Of course, they have been already completed anyway. > A lot of more basic data model is not well defined (I know we didn't originally have the habit)(. E.g, String class is used everywhere, sometimes it is partition name, sometimes it is a resource name, sometimes it is a state. It's very confusing and inconsistent, my experience tells me putting comments won't make our life easier, a more self-explanatory code is needed. I agree with this point in general. But I have two concerns. 1. If all Helix components create their own data model objects, it is a large overhead, from both coding's perspective and performance perspective. 2. If we have multiple data model objects pointing to the same item, it might be very confusing and error-prone. The reason is that most model objects we have now have additional fields other than the name. For example, Resource contains config. If we have multiple Resource objects refer to the same resource, we could modify the wrong one. If it's a String, although we might have duplication, we cannot do anything to it. In general, I think we can do either of the following to improve it. 1. Convert (or create) data model classes to the key classes for different types of objects. 2. Only constructing these data model objects in one place. For example, cluster cache. In this way, there is only one object for each entity. We won't have any confusion. I think it's a good call. Although I don't intend to do it in this PR, since it seems to be a larger decision to make, I will follow up with we have a more concrete plan. I will encourage you to think about this point at the whole project scope and maybe initial a discussion with all of us to proceed.
---------------------------------------------------------------- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services