I would prefer to not add a new method. It seems unnecessary.
`localThreadMetadata` does return all threads in all states(*) and thus
provides full insight.

(*) A thread in state DEAD could be returned as long as it's not removed
yet.

I don't see any advantage to pre-filter threads and to exclude threads
in state CREATE or PENDING_SHUTDOWN. Even if a CREATED thread is not
started yet, it is still "alive" in a broader sense. For example, if a
user wants to scale out to 10 thread, and 8 are RUNNING and 2 are in
state CREATED, a user won't need to add 2 more threads -- there are
already 10 threads.

For PENDING_SHUTDOWN and scale in it would be different I guess, as the
proposal would be to filter them out right away. However, filtering them
seems actually not to be "correct", as a thread in PENDING_SHUTDOWN
might still process data and it's thus still "alive".

If there is still a need later to add a new method about "alive thread"
we can always add as a follow up -- removing things is much harder.

I also don't think that there is value in returning names of dead
threads, as we recycle names.


-Matthias


On 9/9/20 10:04 AM, Sophie Blee-Goldman wrote:
> I agree that the current behavior of localThreadsMetadata() does not seem
> to match, but it seems like we will be forced to change it to only return
> currently-alive threads. For one thing, we plan to recycle old thread names.
> It would be pretty confusing for a user to get two (or more) ThreadMetadata
> objects returned with the same name, since AFAICT this is the only
> distinguishing identifier of stream threads. I think we should enforce that
> only live threads are returned by localThreadsMetadata(). Plus, as Matthias
> pointed out, we plan to remove dead threads from the KafkaStreams client,
> so still returning them in the metadata would be extremely odd.
> 
> If we think that there might be some use case that requires knowing which
> threads have died, we could consider adding a method that returns the
> names of dead threads. But the only use case I can imagine would probably
> be better served by a callback that gets invoked when the thread dies, which
> we already have.
> 
> On Tue, Sep 8, 2020 at 11:46 PM Bruno Cadonna <br...@confluent.io> wrote:
> 
>> Hi Matthias and Sophie,
>>
>> I agree that localThreadsMetadata() can be used here. However,
>> localThreadsMetadata() returns all stream threads irrespectively of
>> their states. Alive stream threads are specified as being in one of the
>> following states: RUNNING, STARTING, PARTITIONS_REVOKED, and
>> PARTITIONS_ASSIGNED. Hence, users would need to filter the result of
>> localThreadsMetadata(). I thought, it would be neat to have a method
>> that hides this filtering and returns the number of alive stream
>> threads, because that is the most basic information you might need to
>> decide about adding or removing stream threads. For all more advanced
>> use cases users should use localThreadsMetadata(). I am also happy with
>> removing the method. WDYT?
>>
>> Best,
>> Bruno
>>
>> On 09.09.20 03:51, Matthias J. Sax wrote:
>>> Currently we, don't cleanup dead threads, but the KIP proposes to change
>>> this:
>>>
>>>> Stream threads that are in state DEAD will be removed from the stream
>> threads of a Kafka Streams client.
>>>
>>>
>>> -Matthias
>>>
>>> On 9/8/20 2:37 PM, Sophie Blee-Goldman wrote:
>>>> Ah, I forgot about localThreadsMetadata(). In that. case I agree,
>> there's
>>>> no reason
>>>> to introduce a new method when we can get both the names and number of
>> all
>>>> running threads from this.
>>>>
>>>> I assume that we would update localThreadsMetadata to only return
>> currently
>>>> alive threads as part of this KIP -- at a quick glance, it seems like we
>>>> don't do
>>>> any pruning of dead threads at the moment
>>>>
>>>> On Tue, Sep 8, 2020 at 1:58 PM Matthias J. Sax <mj...@apache.org>
>> wrote:
>>>>
>>>>> I am not sure if we need a new method? There is already
>>>>> `localThreadsMetadata()`. What do we gain by adding a new one?
>>>>>
>>>>> Returning the thread's name (as `Optional<String>`) for both add() and
>>>>> remove() is fine with me.
>>>>>
>>>>>
>>>>> -Matthias
>>>>>
>>>>> On 9/8/20 12:58 PM, Sophie Blee-Goldman wrote:
>>>>>> Sorry Bruno, I think I missed the end of your message with the
>>>>>> numberOfAliveStreamThreads()
>>>>>> proposal. I agree, that would be better than the alternatives I
>> listed.
>>>>>> That said:
>>>>>>
>>>>>>> They rather suggest that the method returns a list of handles to the
>>>>>> stream threads.
>>>>>>
>>>>>> I hadn't thought of that originally, but now that you mention it, this
>>>>>> might be a good idea.
>>>>>> I don't think we should return actual handles on the threads, but
>> maybe a
>>>>>> list of the thread
>>>>>> names rather than a single number of currently alive threads.
>>>>>>
>>>>>> Since we seem to think it would be difficult if not impossible to keep
>>>>>> track of the number
>>>>>> of running stream threads, we should apply the same reasoning to the
>>>>> names
>>>>>> and not
>>>>>> assume the user can/will keep track of every thread returned by
>>>>>> addStreamThread() or
>>>>>> removeStreamThread(). Users should generally take any required action
>>>>>> immediately
>>>>>> after adding/removing the thread -- eg deregistering the thread
>> metrics
>>>>> --
>>>>>> but it might
>>>>>> still be useful to provide a convenience method listing all of the
>>>>> current
>>>>>> threads
>>>>>>
>>>>>> And of course you could still get the number of threads easily by
>>>>> invoking
>>>>>> size() on the
>>>>>> returned list (or ordered set?).
>>>>>>
>>>>>> On Tue, Sep 8, 2020 at 12:16 PM Bruno Cadonna <br...@confluent.io>
>>>>> wrote:
>>>>>>
>>>>>>> Thank you again for the feedback Sophie!
>>>>>>>
>>>>>>> As I tried to point out in my previous e-mail, removing a stream
>> thread
>>>>>>> from a Kafka Streams client that does not have alive stream threads
>> is
>>>>>>> nothing exceptional for the client per se. However, it can become
>>>>>>> exceptional within the context of the user. For example, if users
>> want
>>>>>>> to remove a stream thread from a client without alive stream threads
>>>>>>> because one if their metrics say so, then this is exceptional in the
>>>>>>> context of that user metric not in the context of the Kafka Streams
>>>>>>> client. In that case, users should throw an exception and handle it.
>>>>>>>
>>>>>>> Regarding returning null, I do not like to return null because from a
>>>>>>> development point of view there is no distinction between returning
>> null
>>>>>>> because we have a bug in the code or returning null because there
>> are no
>>>>>>> alive stream threads. Additionally, Optional<String> makes it more
>>>>>>> explicit that the result could also be empty.
>>>>>>>
>>>>>>> Thank you for the alternative method names! However, with the names
>> you
>>>>>>> propose it is not immediately clear that the method returns an
>> amount of
>>>>>>> stream threads. They rather suggest that the method returns a list of
>>>>>>> handles to the stream threads. I chose to use "aliveStreamThreads"
>> to be
>>>>>>> consistent with the client-level metric "alive-stream-threads" which
>>>>>>> reports the same number of stream threads that
>>>>>>> numberOfAliveStreamThreads() should report. If others also think that
>>>>>>> the proposed name in the KIP is too clumsy, I am open to rename it,
>>>>> though.
>>>>>>>
>>>>>>> Best,
>>>>>>> Bruno
>>>>>>>
>>>>>>>
>>>>>>> On 08.09.20 20:12, Sophie Blee-Goldman wrote:
>>>>>>>>> it's never a good sign when the discussion moves into the vote
>> thread
>>>>>>>>
>>>>>>>> Hah, sorry, the gmail consolidation of [VOTE] and [DISCUSS] threads
>>>>>>> strikes
>>>>>>>> again.
>>>>>>>> Thanks for redirecting me Bruno
>>>>>>>>
>>>>>>>> I suppose it's unfair to expect the callers to keep perfect track of
>>>>> the
>>>>>>>> current
>>>>>>>>    number of stream threads, but it also seems like you shouldn't be
>>>>>>> calling
>>>>>>>> removeStreamThread() when there are no threads left. Either you're
>> just
>>>>>>>> haphazardly removing threads and could unintentionally slip into a
>>>>> state
>>>>>>> of
>>>>>>>> no
>>>>>>>> running threads without realizing it, or more realistically, you're
>>>>>>>> carefully
>>>>>>>> removing threads based on some metric(s) that convey whether the
>> system
>>>>>>> is
>>>>>>>> over or under-provisioned. If your metrics say you're
>> over-provisioned
>>>>>>> but
>>>>>>>> there's
>>>>>>>> not one thread running, well, that certainly sounds exceptional to
>> me.
>>>>> Or
>>>>>>>> you might
>>>>>>>> be right in that the cluster is over-provisioned but have just been
>>>>>>>> directing the
>>>>>>>> removeStreamThread() and addStreamThread() calls to instances at
>>>>> random,
>>>>>>> and
>>>>>>>> end up with one massive instance and one with no threads at all.
>> Again,
>>>>>>>> this
>>>>>>>> probably merits some human intervention (or system redesign)
>>>>>>>>
>>>>>>>> That said, I don't think there's any real harm to just returning
>> null
>>>>> in
>>>>>>>> this case, but I hope
>>>>>>>> that users would pay attention to this since it seems likely to
>>>>> indicate
>>>>>>>> something has gone
>>>>>>>> seriously wrong. I suppose Optional<String> would be a reasonable
>>>>>>>> compromise.
>>>>>>>>
>>>>>>>> As for the method name, what about activeStreamThreads() or
>>>>>>>> liveStreamThreads() ?
>>>>>>>>
>>>>>>>> On Mon, Sep 7, 2020 at 1:45 AM Bruno Cadonna <br...@confluent.io>
>>>>> wrote:
>>>>>>>>
>>>>>>>>> Hi John,
>>>>>>>>>
>>>>>>>>> I agree with you except for checking null. I would rather prefer to
>>>>> use
>>>>>>>>> Optional<String> as the return type to both methods.
>>>>>>>>>
>>>>>>>>> I changed the subject from [VOTE] to [DISCUSS] so that we can
>> follow
>>>>> up
>>>>>>>>> in the discussion thread.
>>>>>>>>>
>>>>>>>>> Best,
>>>>>>>>> Bruno
>>>>>>>>>
>>>>>>>>> On 04.09.20 23:12, John Roesler wrote:
>>>>>>>>>> Hi Sophie,
>>>>>>>>>>
>>>>>>>>>> Uh, oh, it's never a good sign when the discussion moves
>>>>>>>>>> into the vote thread :)
>>>>>>>>>>
>>>>>>>>>> I agree with you, it seems like a good touch for
>>>>>>>>>> removeStreamThread() to return the name of the thread that
>>>>>>>>>> got removed, rather than a boolean flag. Maybe the return
>>>>>>>>>> value would be `null` if there is no thread to remove.
>>>>>>>>>>
>>>>>>>>>> If we go that way, I'd suggest that addStreamThread() also
>>>>>>>>>> return the name of the newly created thread, or null if no
>>>>>>>>>> thread can be created right now.
>>>>>>>>>>
>>>>>>>>>> I'm not completely sure if I think that callers of this
>>>>>>>>>> method would know exactly how many threads there are. Sure,
>>>>>>>>>> if a human being is sitting there looking at the metrics or
>>>>>>>>>> logs and decides to call the method, it would work out, but
>>>>>>>>>> I'd expect this kind of method to find its way into
>>>>>>>>>> automated tooling that reacts to things like current system
>>>>>>>>>> load or resource saturation. Those kinds of toolchains often
>>>>>>>>>> are part of a distributed system, and it's probably not that
>>>>>>>>>> easy to guarantee that the thread count they observe is
>>>>>>>>>> fully consistent with the number of threads that are
>>>>>>>>>> actually running. Therefore, an in-situ `int
>>>>>>>>>> numStreamThreads()` method might not be a bad idea. Then
>>>>>>>>>> again, it seems sort of optional. A caller can catch an
>>>>>>>>>> exception or react to a `null` return value just the same
>>>>>>>>>> either way. Having both add/remove methods behave similarly
>>>>>>>>>> is probably more valuable.
>>>>>>>>>>
>>>>>>>>>> Thanks,
>>>>>>>>>> -John
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> On Thu, 2020-09-03 at 12:15 -0700, Sophie Blee-Goldman
>>>>>>>>>> wrote:
>>>>>>>>>>> Hey, sorry for the late reply, I just have one minor suggestion.
>>>>> Since
>>>>>>>>> we
>>>>>>>>>>> don't
>>>>>>>>>>> make any guarantees about which thread gets removed or allow the
>>>>> user
>>>>>>> to
>>>>>>>>>>> specify, I think we should return either the index or full name
>> of
>>>>> the
>>>>>>>>>>> thread
>>>>>>>>>>> that does get removed by removeThread().
>>>>>>>>>>>
>>>>>>>>>>> I know you just updated the KIP to return true/false if there
>>>>>>>>> are/aren't any
>>>>>>>>>>> threads to be removed, but I think this would be more
>> appropriate as
>>>>>>> an
>>>>>>>>>>> exception than as a return type. I think it's reasonable to
>> expect
>>>>>>>>> users to
>>>>>>>>>>> have some sense to how many threads are remaining, and not try to
>>>>>>> remove
>>>>>>>>>>> a thread when there is none left. To me, that indicates something
>>>>>>> wrong
>>>>>>>>>>> with the user application code and should be treated as an
>>>>> exceptional
>>>>>>>>> case.
>>>>>>>>>>> I don't think the same code clarify argument applies here as to
>> the
>>>>>>>>>>> addStreamThread() case, as there's no reason for an application
>> to
>>>>> be
>>>>>>>>>>> looping and retrying removeStreamThread()  since if that fails,
>> it's
>>>>>>>>> because
>>>>>>>>>>> there are no threads left and thus it will continue to always
>> fail.
>>>>>>> And
>>>>>>>>> if
>>>>>>>>>>> the
>>>>>>>>>>> user actually wants to shut down all threads, they should just
>> close
>>>>>>> the
>>>>>>>>>>> whole application rather than call removeStreamThread() in a
>> loop.
>>>>>>>>>>>
>>>>>>>>>>> While I generally think it should be straightforward for users to
>>>>>>> track
>>>>>>>>> how
>>>>>>>>>>> many stream threads they have running, maybe it would be nice to
>> add
>>>>>>>>>>> a small utility method that does this for them. Something like
>>>>>>>>>>>
>>>>>>>>>>> // Returns the number of currently alive threads
>>>>>>>>>>> boolean runningStreamThreads();
>>>>>>>>>>>
>>>>>>>>>>> On Thu, Sep 3, 2020 at 7:41 AM Matthias J. Sax <mj...@apache.org
>>>
>>>>>>>>> wrote:
>>>>>>>>>>>
>>>>>>>>>>>> +1 (binding)
>>>>>>>>>>>>
>>>>>>>>>>>> On 9/3/20 6:16 AM, Bruno Cadonna wrote:
>>>>>>>>>>>>> Hi,
>>>>>>>>>>>>>
>>>>>>>>>>>>> I would like to start the voting on KIP-663 that proposes to
>> add
>>>>>>>>> methods
>>>>>>>>>>>>> to the Kafka Streams client to add and remove stream threads
>>>>> during
>>>>>>>>>>>>> execution.
>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>
>>>>>>>
>>>>>
>> https://cwiki.apache.org/confluence/display/KAFKA/KIP-663%3A+API+to+Start+and+Shut+Down+Stream+Threads
>>>>>>>>>>>>>
>>>>>>>>>>>>> Best,
>>>>>>>>>>>>> Bruno
>>>>>>>>>>
>>>>>>>>>
>>>>>>>>
>>>>>>>
>>>>>>
>>>>>
>>>>>
>>>>
>>>
>>
> 

Attachment: signature.asc
Description: OpenPGP digital signature

Reply via email to