Update:
I just merged Kiley's https://github.com/apache/beam/pull/14833, in which I
tried several "Run Java Precommit" and didn't observe the logging test
(BeamFnLoggingServiceTest) failures. Let's see how the builds go.


Kenn, Ismaël, and Kiley,
Thank you for the help and follow-up!


On Thu, May 13, 2021 at 10:39 AM Tomo Suzuki <[email protected]> wrote:

> I'm giving up! Can anyone troubleshoot this gRPC concurrency problem
> further?
> My current view of the problem (link
> <https://github.com/apache/beam/pull/14768#issuecomment-840576342>) is
> that "grpc-default-executor" threads stop processing the data. But I cannot
> tell why.
>
> I also raised an question to grpc-java on how best to troubleshoot such
> situation
> https://github.com/grpc/grpc-java/issues/8174
>
> On Wed, May 12, 2021 at 11:29 PM Tomo Suzuki <[email protected]> wrote:
>
>> Update: still the root cause of is unknown.
>>
>> From my observation with debug logging and thread dump,
>> "grpc-default-executor-XXX" threads disappear when the problematic tests
>> become hung.
>> More notes:
>> https://github.com/apache/beam/pull/14768#issuecomment-840228795
>>
>> Interestingly the "grpc-default-executor-XXX" threads reappear in the
>> logs when the pause triggers a 5-second timeout set by JUnit.
>>
>>
>> On Tue, May 11, 2021 at 1:12 PM Tomo Suzuki <[email protected]> wrote:
>>
>>> Thank you for the advice. Yes, the latch not being counted-down is the
>>> problem. (my memo:
>>> https://github.com/apache/beam/pull/14474#discussion_r619557479 ) I'll
>>> need to figure out why withOnError is not called.
>>>
>>>
>>> > Can you repro locally?
>>>
>>> No, the task succeeds in my environment (./gradlew
>>> :runners:google-cloud-dataflow-java:worker:test).
>>>
>>>
>>> On Tue, May 11, 2021 at 12:34 PM Kenneth Knowles <[email protected]>
>>> wrote:
>>>
>>>> I am not sure how much you read the code of the test. So apologies if I
>>>> am saying things you already know. The test does something like:
>>>>
>>>>  - start a logging service
>>>>  - set up some stub clients, each with onError wired up to release a
>>>> countdown latch
>>>>  - send error responses to all three of them (actually it sends the
>>>> error in the same task it creates the stub)
>>>>  - each task waits on the latch
>>>>
>>>> So if onError does not deliver or does not call to release the
>>>> countdown latch, it will hang. I notice in the gist you provide that all
>>>> three stub clients are hung awaiting the latch. That is suspicious to me. I
>>>> would want to confirm if the flakiness always occurs in a way that hangs
>>>> all three. Then there are gRPC workers waiting on empty queues, and the
>>>> main test thread waiting for the hung tasks to complete.
>>>>
>>>> The problem could be something about the test set up. Personally I
>>>> would add a ton of logs, or potentially use a debugger, to confirm exactly
>>>> the state of things when it hangs. Can you repro locally? I think this same
>>>> functionality could be tested in different ways that might remove some of
>>>> the variables. For example starting up all the waiting tasks, then sending
>>>> all the onError messages that should cause them to terminate.
>>>>
>>>> Since this is a unit test, adding a timeout to just that method should
>>>> save time (but will make it harder to capture stack traces, etc). I've
>>>> opened up https://github.com/apache/beam/pull/14781 for that. There
>>>> may be a nice way to add a timeout to the executor to capture the hung
>>>> stack, but I didn't look for it.
>>>>
>>>> Kenn
>>>>
>>>> On Tue, May 11, 2021 at 7:36 AM Tomo Suzuki <[email protected]> wrote:
>>>>
>>>>> gRPC 1.37.0 showed the same problem:
>>>>> BeamFnLoggingServiceTest.testMultipleClientsFailingIsHandledGracefullyByServer
>>>>> waits tasks forever, causing timeout in Java precommit.
>>>>>
>>>>> While I continue my investigation, I appreciate if someone knows the
>>>>> cause of the problem, I pasted the thread dump of the Java process when 
>>>>> the
>>>>> test was frozen:
>>>>> https://github.com/apache/beam/pull/14768
>>>>>
>>>>> If this mystery is never solved, vendoring (a bit old) gRPC 1.32.2
>>>>> without the jboss dependencies is an alternate option, (suggestion by 
>>>>> Kenn;
>>>>> memo
>>>>> <https://issues.apache.org/jira/browse/BEAM-11227?focusedCommentId=17318238&page=com.atlassian.jira.plugin.system.issuetabpanels%3Acomment-tabpanel#comment-17318238>
>>>>> )
>>>>>
>>>>> Regards,
>>>>> Tomo
>>>>>
>>>>>
>>>>> On Mon, May 10, 2021 at 9:40 AM Tomo Suzuki <[email protected]>
>>>>> wrote:
>>>>>
>>>>>> I was investigating the strange timeout (
>>>>>> https://github.com/apache/beam/pull/14474) but was occupied with
>>>>>> something else lately.
>>>>>> Let me try the new version today to see any improvements.
>>>>>>
>>>>>>
>>>>>> On Mon, May 10, 2021 at 4:57 AM Ismaël Mejía <[email protected]>
>>>>>> wrote:
>>>>>>
>>>>>>> I just saw that gRPC 1.37.1 is out now (and with aarch64 support for
>>>>>>> python!) that made me wonder about this, what is the current status of
>>>>>>> upgrading the vendored dependency Tomo?
>>>>>>>
>>>>>>>
>>>>>>> On Thu, Apr 8, 2021 at 4:16 PM Tomo Suzuki <[email protected]>
>>>>>>> wrote:
>>>>>>>
>>>>>>>> We observed the cron job of Java Precommit for the master branch
>>>>>>>> started timing out often (not always) since upgrading the gRPC version.
>>>>>>>> https://github.com/apache/beam/pull/14466#issuecomment-815343974
>>>>>>>>
>>>>>>>> Exchanged messages with Kenn, I reverted to the change; now the
>>>>>>>> master branch uses the vendored gRPC 1.26.
>>>>>>>>
>>>>>>>>
>>>>>>>> On Wed, Mar 31, 2021 at 11:40 AM Kenneth Knowles <[email protected]>
>>>>>>>> wrote:
>>>>>>>>
>>>>>>>>> Merged. Let's keep an eye for trouble, and I will incorporate to
>>>>>>>>> the release branch.
>>>>>>>>>
>>>>>>>>> Kenn
>>>>>>>>>
>>>>>>>>> On Wed, Mar 31, 2021 at 6:45 AM Tomo Suzuki <[email protected]>
>>>>>>>>> wrote:
>>>>>>>>>
>>>>>>>>>> Regarding troubleshooting on build timeout, it seems that Docker
>>>>>>>>>> cache in Jenkins machines might be playing a role. As I run more 
>>>>>>>>>> "Java
>>>>>>>>>> Presubmit", I no longer observe timeouts in the PR.
>>>>>>>>>>
>>>>>>>>>> Kenn, would you merge the PR?
>>>>>>>>>> https://github.com/apache/beam/pull/14295 (all checks green,
>>>>>>>>>> including the new Java postcommit checks)
>>>>>>>>>>
>>>>>>>>>> On Thu, Mar 25, 2021 at 5:24 PM Kenneth Knowles <[email protected]>
>>>>>>>>>> wrote:
>>>>>>>>>>
>>>>>>>>>>> Yes, I agree this might be a good idea. This is not the only
>>>>>>>>>>> major issue on the release-2.29.0 branch.
>>>>>>>>>>>
>>>>>>>>>>> The counter argument is that we will be pulling in all the bugs
>>>>>>>>>>> introduced to `master` since the branch cut.
>>>>>>>>>>>
>>>>>>>>>>> As far as effort goes, I have been mostly focused on burning
>>>>>>>>>>> down the bugs so I would not lose much work in the release process.
>>>>>>>>>>>
>>>>>>>>>>> Kenn
>>>>>>>>>>>
>>>>>>>>>>> On Thu, Mar 25, 2021 at 1:42 PM Ismaël Mejía <[email protected]>
>>>>>>>>>>> wrote:
>>>>>>>>>>>
>>>>>>>>>>>> Precommit is quite unstable in the last days, so worth to check
>>>>>>>>>>>> if
>>>>>>>>>>>> something is wrong in the CI.
>>>>>>>>>>>>
>>>>>>>>>>>> I have a question Kenn. Given that cherry picking this might be
>>>>>>>>>>>> a bit
>>>>>>>>>>>> big as a change can we just reconsider cutting the 2.29.0
>>>>>>>>>>>> branch again
>>>>>>>>>>>> after the updated gRPC version use gets merged and mark the
>>>>>>>>>>>> issues
>>>>>>>>>>>> already fixed for version 2.30.0 to version 2.29.0 ? Seems like
>>>>>>>>>>>> an
>>>>>>>>>>>> easier upgrade path (and we will get some nice
>>>>>>>>>>>> fixes/improvements like
>>>>>>>>>>>> official Spark 3 support for free on the release).
>>>>>>>>>>>>
>>>>>>>>>>>> WDYT?
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>> On Wed, Mar 24, 2021 at 8:06 PM Tomo Suzuki <[email protected]>
>>>>>>>>>>>> wrote:
>>>>>>>>>>>> >
>>>>>>>>>>>> > Update: I observe that Java precommit check is unstable in
>>>>>>>>>>>> the PR to upgrade vendored gRPC (compared with an PR with an empty 
>>>>>>>>>>>> change).
>>>>>>>>>>>> There's no constant failures; sometimes it succeeds and other 
>>>>>>>>>>>> times it
>>>>>>>>>>>> faces timeout and flaky test failures.
>>>>>>>>>>>> >
>>>>>>>>>>>> >
>>>>>>>>>>>> https://github.com/apache/beam/pull/14295#issuecomment-806071087
>>>>>>>>>>>> >
>>>>>>>>>>>> >
>>>>>>>>>>>> > On Mon, Mar 22, 2021 at 10:46 AM Tomo Suzuki <
>>>>>>>>>>>> [email protected]> wrote:
>>>>>>>>>>>> >>
>>>>>>>>>>>> >> Thank you for the voting and I see the artifact available in
>>>>>>>>>>>> Maven Central. I'll work on the PR to use the published artifact 
>>>>>>>>>>>> today.
>>>>>>>>>>>> >>
>>>>>>>>>>>> https://search.maven.org/artifact/org.apache.beam/beam-vendor-grpc-1_36_0/0.1/jar
>>>>>>>>>>>> >>
>>>>>>>>>>>> >> On Tue, Mar 16, 2021 at 3:07 PM Kenneth Knowles <
>>>>>>>>>>>> [email protected]> wrote:
>>>>>>>>>>>> >>>
>>>>>>>>>>>> >>> Update on this: there are some minor issues and then I'll
>>>>>>>>>>>> send out the RC.
>>>>>>>>>>>> >>>
>>>>>>>>>>>> >>> I think this is worth blocking 2.29.0 release on, so I will
>>>>>>>>>>>> do this first. We are still eliminating other blockers from 2.29.0 
>>>>>>>>>>>> anyhow.
>>>>>>>>>>>> >>>
>>>>>>>>>>>> >>> Kenn
>>>>>>>>>>>> >>>
>>>>>>>>>>>> >>> On Mon, Mar 15, 2021 at 7:17 AM Tomo Suzuki <
>>>>>>>>>>>> [email protected]> wrote:
>>>>>>>>>>>> >>>>
>>>>>>>>>>>> >>>> Hi Beam developers,
>>>>>>>>>>>> >>>>
>>>>>>>>>>>> >>>> I'm working on upgrading the vendored gRPC 1.36.0
>>>>>>>>>>>> >>>> https://issues.apache.org/jira/browse/BEAM-11227 (PR:
>>>>>>>>>>>> https://github.com/apache/beam/pull/14028)
>>>>>>>>>>>> >>>> Let me know if you have any questions or concerns.
>>>>>>>>>>>> >>>>
>>>>>>>>>>>> >>>> Background:
>>>>>>>>>>>> >>>> Exchanged messages with Ismaël in BEAM-11227, it seems
>>>>>>>>>>>> that it the ticket created by some automation is false positive, 
>>>>>>>>>>>> but it's
>>>>>>>>>>>> nice to use an artifact without being marked with CVE.
>>>>>>>>>>>> >>>>
>>>>>>>>>>>> >>>> Kenn offered to work as the release manager (as in
>>>>>>>>>>>> https://s.apache.org/beam-release-vendored-artifacts) of the
>>>>>>>>>>>> vendored artifact.
>>>>>>>>>>>> >>>>
>>>>>>>>>>>> >>>> --
>>>>>>>>>>>> >>>> Regards,
>>>>>>>>>>>> >>>> Tomo
>>>>>>>>>>>> >>
>>>>>>>>>>>> >>
>>>>>>>>>>>> >>
>>>>>>>>>>>> >> --
>>>>>>>>>>>> >> Regards,
>>>>>>>>>>>> >> Tomo
>>>>>>>>>>>> >
>>>>>>>>>>>> >
>>>>>>>>>>>> >
>>>>>>>>>>>> > --
>>>>>>>>>>>> > Regards,
>>>>>>>>>>>> > Tomo
>>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> --
>>>>>>>>>> Regards,
>>>>>>>>>> Tomo
>>>>>>>>>>
>>>>>>>>>
>>>>>>>>
>>>>>>>> --
>>>>>>>> Regards,
>>>>>>>> Tomo
>>>>>>>>
>>>>>>>
>>>>>>
>>>>>> --
>>>>>> Regards,
>>>>>> Tomo
>>>>>>
>>>>>
>>>>>
>>>>> --
>>>>> Regards,
>>>>> Tomo
>>>>>
>>>>
>>>
>>> --
>>> Regards,
>>> Tomo
>>>
>>
>>
>> --
>> Regards,
>> Tomo
>>
>
>
> --
> Regards,
> Tomo
>


-- 
Regards,
Tomo

Reply via email to