Hi Chris, Here is an example of logs when I see proper shutdown:
https://gist.githubusercontent.com/zcox/6ec8910bd3f18e36c1a2/raw/666eae24511490bf51a66e56fd90c794ea6b6282/stdout That does show the InterruptedException "Got interrupt exception in broker proxy thread." line. Here's java -version output on the Samza container: java version "1.6.0_33" OpenJDK Runtime Environment (IcedTea6 1.13.5) (6b33-1.13.5-1ubuntu0.14.04) OpenJDK 64-Bit Server VM (build 23.25-b01, mixed mode) Currently all of this stuff (Mesos, Kafka, Samza, etc) is running on my laptop. I've just been using the released Samza 0.8.0 jars from maven central - will try to get Samza building locally and add that tight interrupt loop, and see what happens. Thanks, Zach On Thu, Jan 15, 2015 at 1:11 PM, Chris Riccomini < [email protected]> wrote: > Hey Zach, > > Can you try modifying this line in BrokerProxy: > > > https://github.com/apache/incubator-samza/blob/0.8.0/samza-kafka/src/main/s > cala/org/apache/samza/system/kafka/BrokerProxy.scala#L283 > > To interrupt a lot in a tight loop and continue checking the thread > status? Something like: > > while(thread.isAlive) { > thread.interrupt > } > > I want to see if we can catch the thread at a point when it's sleeping, > not doing IO on the wire. If we can, the thread should shutdown cleanly, > and we should see the "Got interrupt exception in broker proxy thread." > exception. > > Cheers, > Chris > > On 1/15/15 11:01 AM, "Chris Riccomini" <[email protected]> wrote: > > >Hey Zach, > > > >I'm beginning to wonder if this line is the problem: > > > > > > > https://github.com/apache/kafka/blob/0.8.1/core/src/main/scala/kafka/consu > >m > >er/SimpleConsumer.scala#L75 > > > >If the sendRequest in SimpleConsumer catches a Throwable, it will catch > >ClosedByInterruptException. This will cause the interrupt flag to get > >reset, I think. If the consumer the retries the send, and it succeeds, > >then I think it will continue on and ignore the interrupt. Going to poke a > >Kafka dev to see if I am misunderstanding something. > > > >Cheers, > >Chris > > > >On 1/15/15 10:52 AM, "Chris Riccomini" <[email protected]> wrote: > > > >>Hey Zach, > >> > >>Hmm. It sounds like something is catching the InterruptedException, and > >>letting the interrupt flag get reset. If that happens, the > >>ExponentialSleepStrategy [1] will continue running. > >> > >>I walked through the code, but couldn't find where it might be doing so, > >>but it's hard to catch these without a test. > >> > >>> I have never seen that "Restarting consumer due to ..." warn line in > >>>the > >>>logs. > >> > >>If that's the case, then I don't think the BrokerProxy can be > >>reconnecting > >>since reconnect must be set to true, which would require seeing that log > >>line [2]. > >> > >>Out of curiosity, what version of the JVM are you using? Trying to > >>replicate this is proving tricky. > >> > >>I'm beginning to wonder if this has to do with InterruptedException vs. > >>ClosedByInterruptException. In TestStatefulTask, I only see this the > >>first > >>log line: > >> > >> case e: InterruptedException => info("Got interrupt exception in broker > >>proxy thread.") > >> case e: ClosedByInterruptException => info("Got closed by interrupt > >>exception in broker proxy thread.") > >> > >>This is probably because the test connects to a local Kafka broker, so > >>nio > >>socket operations are extremely fast, and the test almost always gets > >>interrupted on a Thread.sleep, not a blocking socket operation. I'm > >>wondering if a ClosedByInterruptException exception would cause the > >>hanging that you're seeing. > >> > >> > >>Out of curiosity, is your container on the same machine as your broker, > >>or > >>a different machine? > >> > >>Cheers, > >>Chris > >> > >>[1] > >> > https://github.com/apache/incubator-samza/blob/0.8.0/samza-core/src/main/ > >>s > >>c > >>ala/org/apache/samza/util/ExponentialSleepStrategy.scala#L80 > >>[2] > >> > https://github.com/apache/incubator-samza/blob/0.8.0/samza-kafka/src/main > >>/ > >>s > >>cala/org/apache/samza/system/kafka/BrokerProxy.scala#L133 > >> > >>On 1/15/15 9:38 AM, "Zach Cox" <[email protected]> wrote: > >> > >>>Hi Chris, > >>> > >>> > >>>> Looking at your logs (from previous email, granted, different > >>>>execution), > >>>> I see that we get as far as SimpleConsumer.disconnect(): > >>>> > >>>> [DEBUG] [2015-01-14 17:19:12,596] [SAMZA-BROKER-PROXY-BrokerProxy > >>>>...] > >>>> o.a.s.s.k.DefaultFetchSimpleConsumer: Disconnecting from > >>>>localhost:9192 > >>>> > >>>> This is invoked by BrokerProxy.scala:135. The only way this should get > >>>> invoked is if an exception is caught. Do you see this log line > >>>>anywhere? > >>>> > >>>> warn("Restarting consumer due to %s. Turn on debugging to get a full > >>>> stack trace." format exception) > >>>> > >>> > >>>DefaultFetchConsumer.close() calls SimpleConsumer.close(), which calls > >>>disconnect(), which also logs that "Disconnecting from localhost:9192" > >>>line: > >>> > >>> > https://github.com/apache/incubator-samza/blob/master/samza-kafka/src/ma > >>>i > >>>n > >>>/scala/org/apache/samza/system/kafka/DefaultFetchSimpleConsumer.scala#L4 > >>>4 > >>> > >>> > https://github.com/apache/kafka/blob/0.8.1/core/src/main/scala/kafka/con > >>>s > >>>u > >>>mer/SimpleConsumer.scala#L50 > >>> > >>>I have never seen that "Restarting consumer due to ..." warn line in the > >>>logs. > >>> > >>>Also note that I'm using Samza 0.8.0, not master. > >>> > >>> > >>>> I ran our TestStatefulTask test a few times, and verified that the > >>>> consumer CAN be shutdown: > >>>> > >>>> [ThreadJob] INFO org.apache.samza.container.SamzaContainer - > >>>>Shutting > >>>> down consumer multiplexer. > >>>> [ThreadJob] INFO org.apache.samza.system.kafka.BrokerProxy - > >>>>Shutting > >>>> down BrokerProxy for localhost:59075 > >>>> 2015-01-15 08:59:26 DefaultFetchSimpleConsumer [WARN] Reconnect due > >>>>to > >>>> socket error: null > >>>> [SAMZA-BROKER-PROXY-BrokerProxy thread pointed at ...] INFO > >>>> org.apache.samza.system.kafka.BrokerProxy - Got closed by interrupt > >>>> exception in broker proxy thread. > >>>> [SAMZA-BROKER-PROXY-BrokerProxy thread pointed at ...] INFO > >>>> org.apache.samza.system.kafka.BrokerProxy - Shutting down due to > >>>>interrupt. > >>>> // consumer shutdown is complete here.. The container has moved on > >>>>to > >>>> shutting down the producer > >>>> [ThreadJob] INFO org.apache.samza.container.SamzaContainer - > >>>>Shutting > >>>> down producer multiplexer. > >>>> > >>> > >>>Yesterday afternoon I was showing this to a colleague, and we did > >>>observe > >>>a > >>>proper clean shutdown once. Then the next 2 times I showed him we saw > >>>the > >>>BrokerProxy blocking issue. So I also observe proper shutdown sometimes > >>>too, just seems a lot more often that BrokerProxy blocks and it doesn't > >>>shut down. > >>> > >>>Thanks, > >>>Zach > >>> > >>> > >>> > >>> > >>>> > >>>> > >>>> Cheers, > >>>> Chris > >>>> > >>>> On 1/14/15 7:24 PM, "Zach Cox" <[email protected]> wrote: > >>>> > >>>> >Hi Chris, > >>>> > > >>>> >I did a thread dump during the shutdown process, when I could tell it > >>>> >wasn't shutting down properly [1]. You can see this thread dump in > >>>>the > >>>> >context of the other logging at shutdown. > >>>> > > >>>> >The main thread is indeed blocked on the Thread.join call in > >>>> >BrokerProxy.stop. The BrokerProxy thread looks like it's still > >>>>consuming > >>>> >from Kafka? > >>>> > > >>>> >The thread dump was a bit tricky; this Samza container is running in > >>>>a > >>>> >Docker container on boot2docker, which is Tiny Core Linux, i.e. no > >>>>jstack. > >>>> >I never new this, but `sudo kill -3 [pid]` tells the jvm to do a > >>>>thread > >>>> >dump to stdout [2] :) > >>>> > > >>>> >Thanks, > >>>> >Zach > >>>> > > >>>> >[1] > >>>> > > >>>> > >>>> > https://gist.githubusercontent.com/zcox/bb47a61d4ae1acd54056/raw/83224d > >>>>2 > >>>>0 > >>>>a > >>>> >61de20bb47bcf916d92930a71cd97ad/gistfile1.txt > >>>> > > >>>> >[2] > >>>> > > >>>> > >>>> > http://serverfault.com/questions/574745/jstack-alternative-for-linux-re > >>>>d > >>>>h > >>>>a > >>>> >t-6 > >>>> > > >>>> > > >>>> >On Wed, Jan 14, 2015 at 7:30 PM, Chris Riccomini < > >>>> >[email protected]> wrote: > >>>> > > >>>> >> Hey Zach, > >>>> >> > >>>> >> It sounds likely that the BrokerProxy thread is blocking or > >>>>improperly > >>>> >> catching an interrupt exception. > >>>> >> > >>>> >> Can you take a thread dump? My guess is that you'll see that the > >>>>main > >>>> >> thread is blocked on BrokerProxy.stop: > >>>> >> > >>>> >> thread.interrupt > >>>> >> thread.join > >>>> >> > >>>> >> > >>>> >> It'll likely be blocked on thread.join. If that's the case, I'd > >>>>like > >>>>to > >>>> >> see what the BrokerProxy thread is doing. This line indicates that > >>>>the > >>>> >> thread seems to be shutting down: > >>>> >> > >>>> >> [DEBUG] [2015-01-14 17:19:12,596] [SAMZA-BROKER-PROXY-BrokerProxy > >>>>thread > >>>> >> pointed at localhost:9192 for client > >>>> >> samza_consumer-twitter_message_separation-1-1421255914862-1] > >>>> >> o.a.s.s.k.DefaultFetchSimpleConsumer: Disconnecting from > >>>>localhost:9192 > >>>> >> > >>>> >> But if thread.join is blocking, then the shutdown never seems to > >>>> >>complete. > >>>> >> It'd be good to see where these threads are at. > >>>> >> > >>>> >> > >>>> >> Cheers, > >>>> >> Chris > >>>> >> > >>>> >> On 1/14/15 10:49 AM, "Zach Cox" <[email protected]> wrote: > >>>> >> > >>>> >> >Hi - related to the discussion yesterday about graceful shutdown > >>>>[1], > >>>> >> >today > >>>> >> >I can't seem to get the SamzaContainer to actually shut down. > >>>> >>Yesterday I > >>>> >> >was seeing nice, fast, complete shutdown logs [2]. However, today > >>>>the > >>>> >>last > >>>> >> >log line I see is related to shutting down BrokerProxy, then > >>>>nothing > >>>> >>else > >>>> >> >happens until the container is finally just SIGKILLed by Docker > >>>>[3]. > >>>> >> > > >>>> >> >The consistency with which clean, fast shutdowns were happening > >>>> >>yesterday > >>>> >> >and now they are never happening today leads me to believe I've > >>>>changed > >>>> >> >something, but I really can't find anything that would lead to > >>>>this. > >>>> >>It's > >>>> >> >almost like there's some deadlock in BrokerProxy, but I would > >>>>think > >>>> >>that > >>>> >> >would show up randomly - I've tried this about 100 times today, > >>>>and > >>>> >>every > >>>> >> >time the logs are as shown in [3]. Today I also see 2 exceptions > >>>>in > >>>>the > >>>> >> >Kafka broker logs [4] but I don't know if those were occurring > >>>> >>yesterday > >>>> >> >when Samza was cleanly shutting down, or not. > >>>> >> > > >>>> >> >One thing that doesn't seem to be happening today is this log line > >>>>from > >>>> >> >BrokerProxy: "Got interrupt exception in broker proxy thread." > >>>> >> > > >>>> >> >Has anyone seen this, or have any advice on what to try next? > >>>> >> > > >>>> >> >Thanks, > >>>> >> >Zach > >>>> >> > > >>>> >> > > >>>> >> >[1] > >>>> >> > >>>> >>> > >>>> > >>>> > http://www.mail-archive.com/[email protected]/msg02246.htm > >>>>l > >>>> >> > > >>>> >> >[2] > >>>> >> > > >>>> >> > >>>> >> > >>>> > >>>> > https://gist.githubusercontent.com/zcox/6ec8910bd3f18e36c1a2/raw/666eae > >>>>2 > >>>>4 > >>>> >>5 > >>>> >> >11490bf51a66e56fd90c794ea6b6282/stdout > >>>> >> > > >>>> >> >[3] > >>>> >> > > >>>> >> > >>>> >> > >>>> > >>>> > https://gist.githubusercontent.com/zcox/673a2ba607c566de7650/raw/d29ea8 > >>>>b > >>>>9 > >>>> >>9 > >>>> >> >868da9cc4dc1db198315ee1d03bc694/gistfile1.txt > >>>> >> > > >>>> >> >[4] > >>>> >> > > >>>> >> > >>>> >> > >>>> > >>>> > https://gist.githubusercontent.com/zcox/f08de55c5d5fe2d70cde/raw/c4be45 > >>>>3 > >>>>b > >>>> >>5 > >>>> >> >f6a57cfb5acb08508667f4b8ce8c2bd/gistfile1.txt > >>>> >> > >>>> >> > >>>> > >>>> > >> > > > >
