Sorry for not catching up on this thread earlier, I wanted to-do this
before the KIP got its updates so we could discuss if need be and not waste
more time re-writing/working things that folks have issues with or such. I
captured all the comments so far here with responses.

<< So fair assignment by count (taking into account the current partition
count of each broker) is very good. However, it's worth noting that all
partitions are not created equal. We have actually been performing more
rebalance work based on the partition size on disk, as given equal
retention of all topics, the size on disk is a better indicator of the
amount of traffic a partition gets, both in terms of storage and network
traffic. Overall, this seems to be a better balance.

Agreed though this is out of scope (imho) for what the motivations for the
KIP were. The motivations section is blank (that is on me) but honestly it
is because we did all the development, went back and forth with Neha on the
testing and then had to back it all into the KIP process... Its a
time/resource/scheduling and hope to update this soon on the KIP ... all of
this is in the JIRA and code patch so its not like it is not there just not
in the place maybe were folks are looking since we changed where folks
should look.

Initial cut at "Motivations": the --generate is not used by a lot of folks
because they don't trust it. Issues such as giving different results
sometimes when you run it. Also other feedback from the community that it
does not account for specific uses cases like "adding new brokers" and
"removing brokers" (which is where that patch started
https://issues.apache.org/jira/browse/KAFKA-1678 but then we changed it
after review into just --rebalance
https://issues.apache.org/jira/browse/KAFKA-1792). The use case for add and
remove brokers is one that happens in AWS and auto scailing. There are
other reasons for this too of course.  The goal originally was to make what
folks are already coding today (with the output of " available in the
project for the community. Based on the discussion in the JIRA with Neha we
all agreed that making it be a faire rebalance would fulfill both uses
cases.

<< In addition to this, I think there is very much a need to have Kafka be
rack-aware. That is, to be able to assure that for a given cluster, you
never assign all replicas for a given partition in the same rack. This
would allow us to guard against maintenances or power failures that affect
a full rack of systems (or a given switch).

Agreed, this though I think is out of scope for this change and something
we can also do in the future. There is more that we have to figure out for
rack aware specifically answering "how do we know what rack the broker is
on". I really really (really) worry that we keep trying to put too much
into a single change the discussions go into rabbit holes and good
important features (that are community driven) that could get out there
will get bogged down with different uses cases and scope creep. So, I think
rack awareness is its own KIP that has two parts... setting broker rack and
rebalancing for that. That features doesn't invalidate the need for
--rebalance but can be built on top of it.

<< I think it would make sense to implement the reassignment logic as a
pluggable component. That way it would be easy to select a scheme when
performing a reassignment (count, size, rack aware). Configuring a default
scheme for a cluster would allow for the brokers to create new topics and
partitions in compliance with the requested policy.

I don't agree with this because right now you get back "the current state
of the partitions" so you can (today) write whatever logic you want (with
the information that is there). With --rebalance you also get that back so
moving forward. Moving forward we can maybe expose more information so that
folks can write different logic they want
(like partition number, location (label string for rack), size, throughput
average, etc, etc, etc... but again... that to me is a different
KIP entirely from the motivations of this one. If eventually we want to
make it plugable then we should have a KIP and discussion around it I just
don't see how it relates to the original motivations of the change.

<< Is it possible to describe the proposed partition reassignment algorithm
in more detail on the KIP? In fact, it would be really easy to understand
if we had some concrete examples comparing partition assignment with the
old algorithm and the new.

sure, it is in the JIRA linked to the KIP too though
https://issues.apache.org/jira/browse/KAFKA-1792 and documented in comments
in the patch also as requested. Let me know if this is what you are looking
for and we can simply update the KIP with this information or give more
detail specifically what you think might be missing please.

<< Would we want to
support some kind of automated reassignment of existing partitions
(personally - no. I want to trigger that manually because it is a very disk
and network intensive process)?

You can automate the reassignment with a line of code that takes the
response and calls --execute if folks want that... I don't think we should
ever link these (or at least not yet) because of the reasons you say. I
think as long as we have a way

********

If there is anything else I missed please let me know so I can make sure
that the detail gets update so we minimize the back and forth both in
efforts and elapsed time. This was always supposed to be a very small fix
for something that pains A LOT of people and I want to make sure that we
aren't running scope creep on the change but are making sure that folks
understand the motivation behind a new feature.

~ Joe Stein
- - - - - - - - - - - - - - - - -

  http://www.stealth.ly
- - - - - - - - - - - - - - - - -

On Sun, Mar 8, 2015 at 1:21 PM, Joe Stein <joe.st...@stealth.ly> wrote:

> Jay,
>
> That makes sense.  I think what folks are bringing up all sounds great but
> I feel can/should be done afterwards as further improvements as the scope
> for this change has a very specific focus to resolve problems folks have
> today with --generate (with a patch tested and ready to go ). I should be
> able to update the KIP this week and followup.
>
> ~ Joestein
> On Mar 8, 2015 12:54 PM, "Jay Kreps" <jay.kr...@gmail.com> wrote:
>
>> Hey Joe,
>>
>> This still seems pretty incomplete. It still has most the sections just
>> containing the default text you are supposed to replace. It is really hard
>> to understand what is being proposed and why and how much of the problem
>> we
>> are addressing. For example the motivation section just says
>> "operational".
>>
>> I'd really like us to do a good job of this. I actually think putting the
>> time in to convey context really matters. For example I think (but can't
>> really know) that what you are proposing is just a simple fix to the JSON
>> output of the command line tool. But you can see that on the thread it is
>> quickly going to spiral into automatic balancing, rack awareness, data
>> movement throttling, etc.
>>
>> Just by giving people a fairly clear description of the change and how it
>> fits into other efforts that could happen in the area really helps keep
>> things focused on what you want.
>>
>> -Jay
>>
>>
>> On Wed, Jan 21, 2015 at 10:11 PM, Joe Stein <joe.st...@stealth.ly> wrote:
>>
>> > Posted a KIP for --re-balance for partition assignment in reassignment
>> > tool.
>> >
>> >
>> >
>> https://cwiki.apache.org/confluence/display/KAFKA/KIP-6+-+New+reassignment+partition+logic+for+re-balancing
>> >
>> > JIRA https://issues.apache.org/jira/browse/KAFKA-1792
>> >
>> > While going through the KIP I thought of one thing from the JIRA that we
>> > should change. We should preserve --generate to be existing
>> functionality
>> > for the next release it is in. If folks want to use --re-balance then
>> > great, it just won't break any upgrade paths, yet.
>> >
>> > /*******************************************
>> >  Joe Stein
>> >  Founder, Principal Consultant
>> >  Big Data Open Source Security LLC
>> >  http://www.stealth.ly
>> >  Twitter: @allthingshadoop <http://www.twitter.com/allthingshadoop>
>> > ********************************************/
>> >
>>
>

Reply via email to