Hello Sergey, Your analysis looks valid to me, we definitely need to investigate this deadlock and find out how to fix it.
Could you create a ticket and write a test that reproduces the issue with sufficient probability? Thanks! On Mon, Mar 16, 2020 at 8:22 PM Sergey-A Kosarev <sergey-a.kosa...@db.com> wrote: > Classification: Public > > Hi, > I've recently tried to apply Ilya's idea ( > https://issues.apache.org/jira/browse/IGNITE-12663) of minimizing thread > pools and tried to set system pool to 3 in my own tests. > It caused deadlock on a client node and I think it can happen not only on > such small pool values. > > Details are following: > I'm not using persistence currently (if it matters). > On the client note I use ignite compute to call a job on every server > node (there are 3 server nodes in the tests). > > Then I've found in logs: > > [10:55:21] : [Step 1/1] [2020-03-13 10:55:21,773] { > grid-timeout-worker-#8} [WARN] [o.a.i.i.IgniteKernal] - Possible thread > pool starvation detected (no task completed in last 30000ms, is system > thread pool size large enough?) > [10:55:21] : [Step 1/1] ^-- System thread pool [active=3, idle=0, > qSize=9] > > > I see in threaddumps that all 3 system pool workers do the same - > processing of job responses: > > "sys-#26" #605 daemon prio=5 os_prio=0 tid=0x0000000064a0a800 nid=0x1f34 > waiting on condition [0x000000007b91d000] > java.lang.Thread.State: WAITING (parking) > at sun.misc.Unsafe.park(Native Method) > at > java.util.concurrent.locks.LockSupport.park(LockSupport.java:304) > at > org.apache.ignite.internal.util.future.GridFutureAdapter.get0(GridFutureAdapter.java:177) > at > org.apache.ignite.internal.util.future.GridFutureAdapter.get(GridFutureAdapter.java:140) > at > org.apache.ignite.internal.processors.cache.binary.CacheObjectBinaryProcessorImpl.metadata(CacheObjectBinaryProcessorImpl.java:749) > at > org.apache.ignite.internal.processors.cache.binary.CacheObjectBinaryProcessorImpl$1.metadata(CacheObjectBinaryProcessorImpl.java:250) > at > org.apache.ignite.internal.binary.BinaryContext.metadata(BinaryContext.java:1169) > at > org.apache.ignite.internal.binary.BinaryReaderExImpl.getOrCreateSchema(BinaryReaderExImpl.java:2005) > at > org.apache.ignite.internal.binary.BinaryReaderExImpl.<init>(BinaryReaderExImpl.java:285) > at > org.apache.ignite.internal.binary.BinaryReaderExImpl.<init>(BinaryReaderExImpl.java:184) > at > org.apache.ignite.internal.binary.BinaryUtils.doReadObject(BinaryUtils.java:1797) > at > org.apache.ignite.internal.binary.BinaryUtils.deserializeOrUnmarshal(BinaryUtils.java:2160) > at > org.apache.ignite.internal.binary.BinaryUtils.doReadCollection(BinaryUtils.java:2091) > at > org.apache.ignite.internal.binary.BinaryReaderExImpl.deserialize0(BinaryReaderExImpl.java:1914) > at > org.apache.ignite.internal.binary.BinaryReaderExImpl.deserialize(BinaryReaderExImpl.java:1714) > at > org.apache.ignite.internal.binary.BinaryReaderExImpl.readField(BinaryReaderExImpl.java:1982) > at > org.apache.ignite.internal.binary.BinaryFieldAccessor$DefaultFinalClassAccessor.read0(BinaryFieldAccessor.java:702) > at > org.apache.ignite.internal.binary.BinaryFieldAccessor.read(BinaryFieldAccessor.java:187) > at > org.apache.ignite.internal.binary.BinaryClassDescriptor.read(BinaryClassDescriptor.java:887) > at > org.apache.ignite.internal.binary.BinaryReaderExImpl.deserialize0(BinaryReaderExImpl.java:1762) > at > org.apache.ignite.internal.binary.BinaryReaderExImpl.deserialize(BinaryReaderExImpl.java:1714) > at > org.apache.ignite.internal.binary.BinaryUtils.doReadObject(BinaryUtils.java:1797) > at > org.apache.ignite.internal.binary.BinaryUtils.deserializeOrUnmarshal(BinaryUtils.java:2160) > at > org.apache.ignite.internal.binary.BinaryUtils.doReadCollection(BinaryUtils.java:2091) > at > org.apache.ignite.internal.binary.BinaryReaderExImpl.deserialize0(BinaryReaderExImpl.java:1914) > at > org.apache.ignite.internal.binary.BinaryReaderExImpl.deserialize(BinaryReaderExImpl.java:1714) > at > org.apache.ignite.internal.binary.GridBinaryMarshaller.deserialize(GridBinaryMarshaller.java:306) > at > org.apache.ignite.internal.binary.BinaryMarshaller.unmarshal0(BinaryMarshaller.java:100) > at > org.apache.ignite.marshaller.AbstractNodeNameAwareMarshaller.unmarshal(AbstractNodeNameAwareMarshaller.java:80) > at > org.apache.ignite.internal.util.IgniteUtils.unmarshal(IgniteUtils.java:10493) > at > org.apache.ignite.internal.processors.task.GridTaskWorker.onResponse(GridTaskWorker.java:828) > at > org.apache.ignite.internal.processors.task.GridTaskProcessor.processJobExecuteResponse(GridTaskProcessor.java:1134) > > > As I found analyzing this stack trace, unmarshalling a user object the > first time(per type) causes Binary metadata request (despite I've > registered this type in BinaryConfiguration.setTypeConfiguration) > > > > And all this futures will be completed after consequent > MetadataResponseMessage will be received and processed on the client node. > > > > But MetadataResponseMessage(GridTopic.TOPIC_METADATA_REQ) is also > processing in system pool. > > (I see that method GridIoManager#processRegularMessage routes it to the > System Pool) > So it causes deadlock as the Sytem Pool is already full. > > Will appreciate any feedback on the topic. > Am I just shooting to my foot or should I create a ticket for this? > Is it correct that both job responses and metadata responses are > processing in the System Pool, and if yes then how this case should be > handled successfully? > I suppose on bigger pool values there is still a chance to get into such a > situation, if many concurrent jobs are invoked from the same client node, > or is there some golden rule to follow - something like not than (capacity > of the system pool - 1) concurrent jobs from the same node should be > invoked? > > Kind regards, > Sergey Kosarev > > > > > --- > This e-mail may contain confidential and/or privileged information. If you > are not the intended recipient (or have received this e-mail in error) > please notify the sender immediately and delete this e-mail. Any > unauthorized copying, disclosure or distribution of the material in this > e-mail is strictly forbidden. > > Please refer to https://www.db.com/disclosures for additional EU > corporate and regulatory disclosures and to > http://www.db.com/unitedkingdom/content/privacy.htm for information about > privacy. >