[GitHub] [helix] jiajunwang commented on issue #362: The WAGED rebalancer cluster model implementation

GitBox Thu, 01 Aug 2019 00:08:57 -0700

jiajunwang commented on issue #362: The WAGED rebalancer cluster model 
implementation
URL: https://github.com/apache/helix/pull/362#issuecomment-517151234
 
 
   @i3wangyi Thanks for the comments. For the general comments, I'm trying to 
answer here.
   
   > You could just have a blank method without detail implementations since 
the implementation details are very likely to change as we proceed. It's the 
same principle of interface oriented programming as well.
   
   I agree with the idea, but since we have the POC code already, I will just 
take this low-hanging fruit. With the latest update, all the essential parts 
are ready.
   
   > More and detailed comments are more than welcome, at least we need to 
document better than our current algorithm does.
   
   Agreed. Especially the algorithm and rebalancer main logic. When we working 
on those parts later, let's firstly try to keep it simple. Then adding comments 
whenever necessary.
   
   > If you agree with item1, let's not worry about the unit test. Things will 
change.
   
   Unit test is actually very important. Even with the interface only, I would 
prefer to have the test frame completed. Of course, they have been already 
completed anyway.
   
   > A lot of more basic data model is not well defined (I know we didn't 
originally have the habit)(. E.g, String class is used everywhere, sometimes it 
is partition name, sometimes it is a resource name, sometimes it is a state. 
It's very confusing and inconsistent, my experience tells me putting comments 
won't make our life easier, a more self-explanatory code is needed.
   
   I agree with this point in general. But I have two concerns.
   1. If all Helix components create their own data model objects, it is a 
large overhead, from both coding's perspective and performance perspective.
   2. If we have multiple data model objects pointing to the same item, it 
might be very confusing and error-prone. The reason is that most model objects 
we have now have additional fields other than the name. For example, Resource 
contains config. If we have multiple Resource objects refer to the same 
resource, we could modify the wrong one. If it's a String, although we might 
have duplication, we cannot do anything to it.
   
   In general, I think we can do either of the following to improve it.
   1. Convert (or create) data model classes to the key classes for different 
types of objects.
   2. Only constructing these data model objects in one place. For example, 
cluster cache. In this way, there is only one object for each entity. We won't 
have any confusion.
   
   I think it's a good call. Although I don't intend to do it in this PR, since 
it seems to be a larger decision to make, I will follow up with we have a more 
concrete plan. I will encourage you to think about this point at the whole 
project scope and maybe initial a discussion with all of us to proceed.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

[GitHub] [helix] jiajunwang commented on issue #362: The WAGED rebalancer cluster model implementation

Reply via email to