Thanks for the observations, I’m still looking at the issue. As mentioned by Alexey, yes, the flake happens very rarely, I haven’t found regular occurrences. I'm looking for the possibility of it being a race condition issue as mentioned, but the error seems to happen when the first messages are initially read, before the second thread is started.
On Wed, Oct 6, 2021 at 12:04 PM Alexey Romanenko <[email protected]> wrote: > Looking at this test (“testCheckpointMarkSafety()"), I’m not sure that > it’s thread-safe to use the same instance of JmsIO.UnboundedJmsReader in > another thread. Probably, it may cause some race conditions there but seems > it happens quite rarely. > > — > Alexey > > > > On 5 Oct 2021, at 21:24, JB Onofré <[email protected]> wrote: > > Hi > > I will take a look. That’s probably a race condition with the broker > service. > > Regards > JB > > Le 5 oct. 2021 à 21:21, Miguel Anzo Palomo <[email protected]> a > écrit : > > > Hi, I've been working on checking out why is this issue > <https://issues.apache.org/jira/browse/BEAM-8453> happening (flaky test > in JmsIO). The logs in this example > <https://ci-beam.apache.org/job/beam_PreCommit_Java_Commit/18732/testReport/org.apache.beam.sdk.io.jms/JmsIOTest/testCheckpointMarkSafety/> > indicate > that the problem is a NullPointerException, specifically in this > receiveNoWait() operation > <https://github.com/apache/beam/blob/9a4cdfba601bae9165928d1a4df8035785b4c871/sdks/java/io/jms/src/main/java/org/apache/beam/sdk/io/jms/JmsIO.java#L503> > at this > <https://github.com/apache/beam/blob/9a4cdfba601bae9165928d1a4df8035785b4c871/sdks/java/io/jms/src/test/java/org/apache/beam/sdk/io/jms/JmsIOTest.java#L386> > line > of the test. The only way I see that a NullPointerException is being caused > there, is if the consumer is closed at that point, and the only way I have > been able to reproduce a NullPointerException locally in the lines > mentioned in the log is by closing the consumer before that read operation. > > My idea right now is that there could be some intermittency with ActiveMQ > that caused the flaky test at that moment, is there a way to know if that > was the case? Right now I only have two known instances of the flake, this > <https://ci-beam.apache.org/job/beam_PreCommit_Java_Commit/18732/> one on > August 23, and another > <https://ci-beam.apache.org/job/beam_PreCommit_Java_Cron/4447/> one on > September 9 (that jenkins link is no longer available). From what I’ve > been told it is running on the cloud compute instance > apache-ci-beam-jenkins. > > Thanks > > -- > Miguel Angel Anzo Palomo | WIZELINE > Software Engineer > [email protected] > Remote Office > > > > > > > > > *This email and its contents (including any attachments) are being sent > toyou on the condition of confidentiality and may be protected by > legalprivilege. Access to this email by anyone other than the intended > recipientis unauthorized. If you are not the intended recipient, please > immediatelynotify the sender by replying to this message and delete the > materialimmediately from your system. Any further use, dissemination, > distributionor reproduction of this email is strictly prohibited. Further, > norepresentation is made with respect to any content contained in this > email.* > > > -- Miguel Angel Anzo Palomo | WIZELINE Software Engineer [email protected] Remote Office -- *This email and its contents (including any attachments) are being sent to you on the condition of confidentiality and may be protected by legal privilege. Access to this email by anyone other than the intended recipient is unauthorized. If you are not the intended recipient, please immediately notify the sender by replying to this message and delete the material immediately from your system. Any further use, dissemination, distribution or reproduction of this email is strictly prohibited. Further, no representation is made with respect to any content contained in this email.*
