Classification: Public Hi, I've recently tried to apply Ilya's idea (https://issues.apache.org/jira/browse/IGNITE-12663) of minimizing thread pools and tried to set system pool to 3 in my own tests. It caused deadlock on a client node and I think it can happen not only on such small pool values.
Details are following: I'm not using persistence currently (if it matters). On the client note I use ignite compute to call a job on every server node (there are 3 server nodes in the tests). Then I've found in logs: [10:55:21] : [Step 1/1] [2020-03-13 10:55:21,773] { grid-timeout-worker-#8} [WARN] [o.a.i.i.IgniteKernal] - Possible thread pool starvation detected (no task completed in last 30000ms, is system thread pool size large enough?) [10:55:21] : [Step 1/1] ^-- System thread pool [active=3, idle=0, qSize=9] I see in threaddumps that all 3 system pool workers do the same - processing of job responses: "sys-#26" #605 daemon prio=5 os_prio=0 tid=0x0000000064a0a800 nid=0x1f34 waiting on condition [0x000000007b91d000] java.lang.Thread.State: WAITING (parking) at sun.misc.Unsafe.park(Native Method) at java.util.concurrent.locks.LockSupport.park(LockSupport.java:304) at org.apache.ignite.internal.util.future.GridFutureAdapter.get0(GridFutureAdapter.java:177) at org.apache.ignite.internal.util.future.GridFutureAdapter.get(GridFutureAdapter.java:140) at org.apache.ignite.internal.processors.cache.binary.CacheObjectBinaryProcessorImpl.metadata(CacheObjectBinaryProcessorImpl.java:749) at org.apache.ignite.internal.processors.cache.binary.CacheObjectBinaryProcessorImpl$1.metadata(CacheObjectBinaryProcessorImpl.java:250) at org.apache.ignite.internal.binary.BinaryContext.metadata(BinaryContext.java:1169) at org.apache.ignite.internal.binary.BinaryReaderExImpl.getOrCreateSchema(BinaryReaderExImpl.java:2005) at org.apache.ignite.internal.binary.BinaryReaderExImpl.<init>(BinaryReaderExImpl.java:285) at org.apache.ignite.internal.binary.BinaryReaderExImpl.<init>(BinaryReaderExImpl.java:184) at org.apache.ignite.internal.binary.BinaryUtils.doReadObject(BinaryUtils.java:1797) at org.apache.ignite.internal.binary.BinaryUtils.deserializeOrUnmarshal(BinaryUtils.java:2160) at org.apache.ignite.internal.binary.BinaryUtils.doReadCollection(BinaryUtils.java:2091) at org.apache.ignite.internal.binary.BinaryReaderExImpl.deserialize0(BinaryReaderExImpl.java:1914) at org.apache.ignite.internal.binary.BinaryReaderExImpl.deserialize(BinaryReaderExImpl.java:1714) at org.apache.ignite.internal.binary.BinaryReaderExImpl.readField(BinaryReaderExImpl.java:1982) at org.apache.ignite.internal.binary.BinaryFieldAccessor$DefaultFinalClassAccessor.read0(BinaryFieldAccessor.java:702) at org.apache.ignite.internal.binary.BinaryFieldAccessor.read(BinaryFieldAccessor.java:187) at org.apache.ignite.internal.binary.BinaryClassDescriptor.read(BinaryClassDescriptor.java:887) at org.apache.ignite.internal.binary.BinaryReaderExImpl.deserialize0(BinaryReaderExImpl.java:1762) at org.apache.ignite.internal.binary.BinaryReaderExImpl.deserialize(BinaryReaderExImpl.java:1714) at org.apache.ignite.internal.binary.BinaryUtils.doReadObject(BinaryUtils.java:1797) at org.apache.ignite.internal.binary.BinaryUtils.deserializeOrUnmarshal(BinaryUtils.java:2160) at org.apache.ignite.internal.binary.BinaryUtils.doReadCollection(BinaryUtils.java:2091) at org.apache.ignite.internal.binary.BinaryReaderExImpl.deserialize0(BinaryReaderExImpl.java:1914) at org.apache.ignite.internal.binary.BinaryReaderExImpl.deserialize(BinaryReaderExImpl.java:1714) at org.apache.ignite.internal.binary.GridBinaryMarshaller.deserialize(GridBinaryMarshaller.java:306) at org.apache.ignite.internal.binary.BinaryMarshaller.unmarshal0(BinaryMarshaller.java:100) at org.apache.ignite.marshaller.AbstractNodeNameAwareMarshaller.unmarshal(AbstractNodeNameAwareMarshaller.java:80) at org.apache.ignite.internal.util.IgniteUtils.unmarshal(IgniteUtils.java:10493) at org.apache.ignite.internal.processors.task.GridTaskWorker.onResponse(GridTaskWorker.java:828) at org.apache.ignite.internal.processors.task.GridTaskProcessor.processJobExecuteResponse(GridTaskProcessor.java:1134) As I found analyzing this stack trace, unmarshalling a user object the first time(per type) causes Binary metadata request (despite I've registered this type in BinaryConfiguration.setTypeConfiguration) And all this futures will be completed after consequent MetadataResponseMessage will be received and processed on the client node. But MetadataResponseMessage(GridTopic.TOPIC_METADATA_REQ) is also processing in system pool. (I see that method GridIoManager#processRegularMessage routes it to the System Pool) So it causes deadlock as the Sytem Pool is already full. Will appreciate any feedback on the topic. Am I just shooting to my foot or should I create a ticket for this? Is it correct that both job responses and metadata responses are processing in the System Pool, and if yes then how this case should be handled successfully? I suppose on bigger pool values there is still a chance to get into such a situation, if many concurrent jobs are invoked from the same client node, or is there some golden rule to follow - something like not than (capacity of the system pool - 1) concurrent jobs from the same node should be invoked? Kind regards, Sergey Kosarev --- This e-mail may contain confidential and/or privileged information. If you are not the intended recipient (or have received this e-mail in error) please notify the sender immediately and delete this e-mail. Any unauthorized copying, disclosure or distribution of the material in this e-mail is strictly forbidden. Please refer to https://www.db.com/disclosures for additional EU corporate and regulatory disclosures and to http://www.db.com/unitedkingdom/content/privacy.htm for information about privacy.