I would prefer to not add a new method. It seems unnecessary. `localThreadMetadata` does return all threads in all states(*) and thus provides full insight.
(*) A thread in state DEAD could be returned as long as it's not removed yet. I don't see any advantage to pre-filter threads and to exclude threads in state CREATE or PENDING_SHUTDOWN. Even if a CREATED thread is not started yet, it is still "alive" in a broader sense. For example, if a user wants to scale out to 10 thread, and 8 are RUNNING and 2 are in state CREATED, a user won't need to add 2 more threads -- there are already 10 threads. For PENDING_SHUTDOWN and scale in it would be different I guess, as the proposal would be to filter them out right away. However, filtering them seems actually not to be "correct", as a thread in PENDING_SHUTDOWN might still process data and it's thus still "alive". If there is still a need later to add a new method about "alive thread" we can always add as a follow up -- removing things is much harder. I also don't think that there is value in returning names of dead threads, as we recycle names. -Matthias On 9/9/20 10:04 AM, Sophie Blee-Goldman wrote: > I agree that the current behavior of localThreadsMetadata() does not seem > to match, but it seems like we will be forced to change it to only return > currently-alive threads. For one thing, we plan to recycle old thread names. > It would be pretty confusing for a user to get two (or more) ThreadMetadata > objects returned with the same name, since AFAICT this is the only > distinguishing identifier of stream threads. I think we should enforce that > only live threads are returned by localThreadsMetadata(). Plus, as Matthias > pointed out, we plan to remove dead threads from the KafkaStreams client, > so still returning them in the metadata would be extremely odd. > > If we think that there might be some use case that requires knowing which > threads have died, we could consider adding a method that returns the > names of dead threads. But the only use case I can imagine would probably > be better served by a callback that gets invoked when the thread dies, which > we already have. > > On Tue, Sep 8, 2020 at 11:46 PM Bruno Cadonna <br...@confluent.io> wrote: > >> Hi Matthias and Sophie, >> >> I agree that localThreadsMetadata() can be used here. However, >> localThreadsMetadata() returns all stream threads irrespectively of >> their states. Alive stream threads are specified as being in one of the >> following states: RUNNING, STARTING, PARTITIONS_REVOKED, and >> PARTITIONS_ASSIGNED. Hence, users would need to filter the result of >> localThreadsMetadata(). I thought, it would be neat to have a method >> that hides this filtering and returns the number of alive stream >> threads, because that is the most basic information you might need to >> decide about adding or removing stream threads. For all more advanced >> use cases users should use localThreadsMetadata(). I am also happy with >> removing the method. WDYT? >> >> Best, >> Bruno >> >> On 09.09.20 03:51, Matthias J. Sax wrote: >>> Currently we, don't cleanup dead threads, but the KIP proposes to change >>> this: >>> >>>> Stream threads that are in state DEAD will be removed from the stream >> threads of a Kafka Streams client. >>> >>> >>> -Matthias >>> >>> On 9/8/20 2:37 PM, Sophie Blee-Goldman wrote: >>>> Ah, I forgot about localThreadsMetadata(). In that. case I agree, >> there's >>>> no reason >>>> to introduce a new method when we can get both the names and number of >> all >>>> running threads from this. >>>> >>>> I assume that we would update localThreadsMetadata to only return >> currently >>>> alive threads as part of this KIP -- at a quick glance, it seems like we >>>> don't do >>>> any pruning of dead threads at the moment >>>> >>>> On Tue, Sep 8, 2020 at 1:58 PM Matthias J. Sax <mj...@apache.org> >> wrote: >>>> >>>>> I am not sure if we need a new method? There is already >>>>> `localThreadsMetadata()`. What do we gain by adding a new one? >>>>> >>>>> Returning the thread's name (as `Optional<String>`) for both add() and >>>>> remove() is fine with me. >>>>> >>>>> >>>>> -Matthias >>>>> >>>>> On 9/8/20 12:58 PM, Sophie Blee-Goldman wrote: >>>>>> Sorry Bruno, I think I missed the end of your message with the >>>>>> numberOfAliveStreamThreads() >>>>>> proposal. I agree, that would be better than the alternatives I >> listed. >>>>>> That said: >>>>>> >>>>>>> They rather suggest that the method returns a list of handles to the >>>>>> stream threads. >>>>>> >>>>>> I hadn't thought of that originally, but now that you mention it, this >>>>>> might be a good idea. >>>>>> I don't think we should return actual handles on the threads, but >> maybe a >>>>>> list of the thread >>>>>> names rather than a single number of currently alive threads. >>>>>> >>>>>> Since we seem to think it would be difficult if not impossible to keep >>>>>> track of the number >>>>>> of running stream threads, we should apply the same reasoning to the >>>>> names >>>>>> and not >>>>>> assume the user can/will keep track of every thread returned by >>>>>> addStreamThread() or >>>>>> removeStreamThread(). Users should generally take any required action >>>>>> immediately >>>>>> after adding/removing the thread -- eg deregistering the thread >> metrics >>>>> -- >>>>>> but it might >>>>>> still be useful to provide a convenience method listing all of the >>>>> current >>>>>> threads >>>>>> >>>>>> And of course you could still get the number of threads easily by >>>>> invoking >>>>>> size() on the >>>>>> returned list (or ordered set?). >>>>>> >>>>>> On Tue, Sep 8, 2020 at 12:16 PM Bruno Cadonna <br...@confluent.io> >>>>> wrote: >>>>>> >>>>>>> Thank you again for the feedback Sophie! >>>>>>> >>>>>>> As I tried to point out in my previous e-mail, removing a stream >> thread >>>>>>> from a Kafka Streams client that does not have alive stream threads >> is >>>>>>> nothing exceptional for the client per se. However, it can become >>>>>>> exceptional within the context of the user. For example, if users >> want >>>>>>> to remove a stream thread from a client without alive stream threads >>>>>>> because one if their metrics say so, then this is exceptional in the >>>>>>> context of that user metric not in the context of the Kafka Streams >>>>>>> client. In that case, users should throw an exception and handle it. >>>>>>> >>>>>>> Regarding returning null, I do not like to return null because from a >>>>>>> development point of view there is no distinction between returning >> null >>>>>>> because we have a bug in the code or returning null because there >> are no >>>>>>> alive stream threads. Additionally, Optional<String> makes it more >>>>>>> explicit that the result could also be empty. >>>>>>> >>>>>>> Thank you for the alternative method names! However, with the names >> you >>>>>>> propose it is not immediately clear that the method returns an >> amount of >>>>>>> stream threads. They rather suggest that the method returns a list of >>>>>>> handles to the stream threads. I chose to use "aliveStreamThreads" >> to be >>>>>>> consistent with the client-level metric "alive-stream-threads" which >>>>>>> reports the same number of stream threads that >>>>>>> numberOfAliveStreamThreads() should report. If others also think that >>>>>>> the proposed name in the KIP is too clumsy, I am open to rename it, >>>>> though. >>>>>>> >>>>>>> Best, >>>>>>> Bruno >>>>>>> >>>>>>> >>>>>>> On 08.09.20 20:12, Sophie Blee-Goldman wrote: >>>>>>>>> it's never a good sign when the discussion moves into the vote >> thread >>>>>>>> >>>>>>>> Hah, sorry, the gmail consolidation of [VOTE] and [DISCUSS] threads >>>>>>> strikes >>>>>>>> again. >>>>>>>> Thanks for redirecting me Bruno >>>>>>>> >>>>>>>> I suppose it's unfair to expect the callers to keep perfect track of >>>>> the >>>>>>>> current >>>>>>>> number of stream threads, but it also seems like you shouldn't be >>>>>>> calling >>>>>>>> removeStreamThread() when there are no threads left. Either you're >> just >>>>>>>> haphazardly removing threads and could unintentionally slip into a >>>>> state >>>>>>> of >>>>>>>> no >>>>>>>> running threads without realizing it, or more realistically, you're >>>>>>>> carefully >>>>>>>> removing threads based on some metric(s) that convey whether the >> system >>>>>>> is >>>>>>>> over or under-provisioned. If your metrics say you're >> over-provisioned >>>>>>> but >>>>>>>> there's >>>>>>>> not one thread running, well, that certainly sounds exceptional to >> me. >>>>> Or >>>>>>>> you might >>>>>>>> be right in that the cluster is over-provisioned but have just been >>>>>>>> directing the >>>>>>>> removeStreamThread() and addStreamThread() calls to instances at >>>>> random, >>>>>>> and >>>>>>>> end up with one massive instance and one with no threads at all. >> Again, >>>>>>>> this >>>>>>>> probably merits some human intervention (or system redesign) >>>>>>>> >>>>>>>> That said, I don't think there's any real harm to just returning >> null >>>>> in >>>>>>>> this case, but I hope >>>>>>>> that users would pay attention to this since it seems likely to >>>>> indicate >>>>>>>> something has gone >>>>>>>> seriously wrong. I suppose Optional<String> would be a reasonable >>>>>>>> compromise. >>>>>>>> >>>>>>>> As for the method name, what about activeStreamThreads() or >>>>>>>> liveStreamThreads() ? >>>>>>>> >>>>>>>> On Mon, Sep 7, 2020 at 1:45 AM Bruno Cadonna <br...@confluent.io> >>>>> wrote: >>>>>>>> >>>>>>>>> Hi John, >>>>>>>>> >>>>>>>>> I agree with you except for checking null. I would rather prefer to >>>>> use >>>>>>>>> Optional<String> as the return type to both methods. >>>>>>>>> >>>>>>>>> I changed the subject from [VOTE] to [DISCUSS] so that we can >> follow >>>>> up >>>>>>>>> in the discussion thread. >>>>>>>>> >>>>>>>>> Best, >>>>>>>>> Bruno >>>>>>>>> >>>>>>>>> On 04.09.20 23:12, John Roesler wrote: >>>>>>>>>> Hi Sophie, >>>>>>>>>> >>>>>>>>>> Uh, oh, it's never a good sign when the discussion moves >>>>>>>>>> into the vote thread :) >>>>>>>>>> >>>>>>>>>> I agree with you, it seems like a good touch for >>>>>>>>>> removeStreamThread() to return the name of the thread that >>>>>>>>>> got removed, rather than a boolean flag. Maybe the return >>>>>>>>>> value would be `null` if there is no thread to remove. >>>>>>>>>> >>>>>>>>>> If we go that way, I'd suggest that addStreamThread() also >>>>>>>>>> return the name of the newly created thread, or null if no >>>>>>>>>> thread can be created right now. >>>>>>>>>> >>>>>>>>>> I'm not completely sure if I think that callers of this >>>>>>>>>> method would know exactly how many threads there are. Sure, >>>>>>>>>> if a human being is sitting there looking at the metrics or >>>>>>>>>> logs and decides to call the method, it would work out, but >>>>>>>>>> I'd expect this kind of method to find its way into >>>>>>>>>> automated tooling that reacts to things like current system >>>>>>>>>> load or resource saturation. Those kinds of toolchains often >>>>>>>>>> are part of a distributed system, and it's probably not that >>>>>>>>>> easy to guarantee that the thread count they observe is >>>>>>>>>> fully consistent with the number of threads that are >>>>>>>>>> actually running. Therefore, an in-situ `int >>>>>>>>>> numStreamThreads()` method might not be a bad idea. Then >>>>>>>>>> again, it seems sort of optional. A caller can catch an >>>>>>>>>> exception or react to a `null` return value just the same >>>>>>>>>> either way. Having both add/remove methods behave similarly >>>>>>>>>> is probably more valuable. >>>>>>>>>> >>>>>>>>>> Thanks, >>>>>>>>>> -John >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> On Thu, 2020-09-03 at 12:15 -0700, Sophie Blee-Goldman >>>>>>>>>> wrote: >>>>>>>>>>> Hey, sorry for the late reply, I just have one minor suggestion. >>>>> Since >>>>>>>>> we >>>>>>>>>>> don't >>>>>>>>>>> make any guarantees about which thread gets removed or allow the >>>>> user >>>>>>> to >>>>>>>>>>> specify, I think we should return either the index or full name >> of >>>>> the >>>>>>>>>>> thread >>>>>>>>>>> that does get removed by removeThread(). >>>>>>>>>>> >>>>>>>>>>> I know you just updated the KIP to return true/false if there >>>>>>>>> are/aren't any >>>>>>>>>>> threads to be removed, but I think this would be more >> appropriate as >>>>>>> an >>>>>>>>>>> exception than as a return type. I think it's reasonable to >> expect >>>>>>>>> users to >>>>>>>>>>> have some sense to how many threads are remaining, and not try to >>>>>>> remove >>>>>>>>>>> a thread when there is none left. To me, that indicates something >>>>>>> wrong >>>>>>>>>>> with the user application code and should be treated as an >>>>> exceptional >>>>>>>>> case. >>>>>>>>>>> I don't think the same code clarify argument applies here as to >> the >>>>>>>>>>> addStreamThread() case, as there's no reason for an application >> to >>>>> be >>>>>>>>>>> looping and retrying removeStreamThread() since if that fails, >> it's >>>>>>>>> because >>>>>>>>>>> there are no threads left and thus it will continue to always >> fail. >>>>>>> And >>>>>>>>> if >>>>>>>>>>> the >>>>>>>>>>> user actually wants to shut down all threads, they should just >> close >>>>>>> the >>>>>>>>>>> whole application rather than call removeStreamThread() in a >> loop. >>>>>>>>>>> >>>>>>>>>>> While I generally think it should be straightforward for users to >>>>>>> track >>>>>>>>> how >>>>>>>>>>> many stream threads they have running, maybe it would be nice to >> add >>>>>>>>>>> a small utility method that does this for them. Something like >>>>>>>>>>> >>>>>>>>>>> // Returns the number of currently alive threads >>>>>>>>>>> boolean runningStreamThreads(); >>>>>>>>>>> >>>>>>>>>>> On Thu, Sep 3, 2020 at 7:41 AM Matthias J. Sax <mj...@apache.org >>> >>>>>>>>> wrote: >>>>>>>>>>> >>>>>>>>>>>> +1 (binding) >>>>>>>>>>>> >>>>>>>>>>>> On 9/3/20 6:16 AM, Bruno Cadonna wrote: >>>>>>>>>>>>> Hi, >>>>>>>>>>>>> >>>>>>>>>>>>> I would like to start the voting on KIP-663 that proposes to >> add >>>>>>>>> methods >>>>>>>>>>>>> to the Kafka Streams client to add and remove stream threads >>>>> during >>>>>>>>>>>>> execution. >>>>>>>>>>>>> >>>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>> >>>>>>> >>>>> >> https://cwiki.apache.org/confluence/display/KAFKA/KIP-663%3A+API+to+Start+and+Shut+Down+Stream+Threads >>>>>>>>>>>>> >>>>>>>>>>>>> Best, >>>>>>>>>>>>> Bruno >>>>>>>>>> >>>>>>>>> >>>>>>>> >>>>>>> >>>>>> >>>>> >>>>> >>>> >>> >> >
signature.asc
Description: OpenPGP digital signature