[ https://issues.apache.org/jira/browse/IGNITE-12793?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Sergey Kosarev updated IGNITE-12793: ------------------------------------ Description: I've recently tried to apply Ilya's idea (https://issues.apache.org/jira/browse/IGNITE-12663) of minimizing thread pools and tried to set system pool to 3 in my own tests. It caused deadlock on a client node and I think it can happen not only on such small pool values. Details are following: I'm not using persistence currently (if it matters). On the client note I use ignite compute to call a job on every server node (there are 3 server nodes in the tests). Then I've found in logs: {{[10:55:21] : [Step 1/1] [2020-03-13 10:55:21,773] {grid-timeout-worker-#8} [WARN] [o.a.i.i.IgniteKernal] - Possible thread pool starvation detected (no task completed in last 30000ms, is system thread pool size large enough?) [10:55:21] : [Step 1/1] ^-- System thread pool [active=3, idle=0, qSize=9]}} I see in threaddumps that all 3 system pool workers do the same - processing of job responses: {{ "sys-#26" #605 daemon prio=5 os_prio=0 tid=0x0000000064a0a800 nid=0x1f34 waiting on condition [0x000000007b91d000] java.lang.Thread.State: WAITING (parking) at sun.misc.Unsafe.park(Native Method) at java.util.concurrent.locks.LockSupport.park(LockSupport.java:304) at org.apache.ignite.internal.util.future.GridFutureAdapter.get0(GridFutureAdapter.java:177) at org.apache.ignite.internal.util.future.GridFutureAdapter.get(GridFutureAdapter.java:140) at org.apache.ignite.internal.processors.cache.binary.CacheObjectBinaryProcessorImpl.metadata(CacheObjectBinaryProcessorImpl.java:749) at org.apache.ignite.internal.processors.cache.binary.CacheObjectBinaryProcessorImpl$1.metadata(CacheObjectBinaryProcessorImpl.java:250) at org.apache.ignite.internal.binary.BinaryContext.metadata(BinaryContext.java:1169) at org.apache.ignite.internal.binary.BinaryReaderExImpl.getOrCreateSchema(BinaryReaderExImpl.java:2005) at org.apache.ignite.internal.binary.BinaryReaderExImpl.<init>(BinaryReaderExImpl.java:285) at org.apache.ignite.internal.binary.BinaryReaderExImpl.<init>(BinaryReaderExImpl.java:184) at org.apache.ignite.internal.binary.BinaryUtils.doReadObject(BinaryUtils.java:1797) at org.apache.ignite.internal.binary.BinaryUtils.deserializeOrUnmarshal(BinaryUtils.java:2160) at org.apache.ignite.internal.binary.BinaryUtils.doReadCollection(BinaryUtils.java:2091) at org.apache.ignite.internal.binary.BinaryReaderExImpl.deserialize0(BinaryReaderExImpl.java:1914) at org.apache.ignite.internal.binary.BinaryReaderExImpl.deserialize(BinaryReaderExImpl.java:1714) at org.apache.ignite.internal.binary.BinaryReaderExImpl.readField(BinaryReaderExImpl.java:1982) at org.apache.ignite.internal.binary.BinaryFieldAccessor$DefaultFinalClassAccessor.read0(BinaryFieldAccessor.java:702) at org.apache.ignite.internal.binary.BinaryFieldAccessor.read(BinaryFieldAccessor.java:187) at org.apache.ignite.internal.binary.BinaryClassDescriptor.read(BinaryClassDescriptor.java:887) at org.apache.ignite.internal.binary.BinaryReaderExImpl.deserialize0(BinaryReaderExImpl.java:1762) at org.apache.ignite.internal.binary.BinaryReaderExImpl.deserialize(BinaryReaderExImpl.java:1714) at org.apache.ignite.internal.binary.BinaryUtils.doReadObject(BinaryUtils.java:1797) at org.apache.ignite.internal.binary.BinaryUtils.deserializeOrUnmarshal(BinaryUtils.java:2160) at org.apache.ignite.internal.binary.BinaryUtils.doReadCollection(BinaryUtils.java:2091) at org.apache.ignite.internal.binary.BinaryReaderExImpl.deserialize0(BinaryReaderExImpl.java:1914) at org.apache.ignite.internal.binary.BinaryReaderExImpl.deserialize(BinaryReaderExImpl.java:1714) at org.apache.ignite.internal.binary.GridBinaryMarshaller.deserialize(GridBinaryMarshaller.java:306) at org.apache.ignite.internal.binary.BinaryMarshaller.unmarshal0(BinaryMarshaller.java:100) at org.apache.ignite.marshaller.AbstractNodeNameAwareMarshaller.unmarshal(AbstractNodeNameAwareMarshaller.java:80) at org.apache.ignite.internal.util.IgniteUtils.unmarshal(IgniteUtils.java:10493) at org.apache.ignite.internal.processors.task.GridTaskWorker.onResponse(GridTaskWorker.java:828) at org.apache.ignite.internal.processors.task.GridTaskProcessor.processJobExecuteResponse(GridTaskProcessor.java:1134) }} As I found analyzing this stack trace, unmarshalling a user object the first time(per type) causes Binary metadata request (despite I've registered this type in BinaryConfiguration.setTypeConfiguration) And all this futures will be completed after consequent MetadataResponseMessage will be received and processed on the client node. But MetadataResponseMessage(GridTopic.TOPIC_METADATA_REQ) is also processing in system pool. (I see that method GridIoManager#processRegularMessage routes it to the System Pool). So it causes deadlock as the Sytem Pool is already full. *Update* **I reproduced this on master(see comments). Full threaddump is attached. was: I've recently tried to apply Ilya's idea (https://issues.apache.org/jira/browse/IGNITE-12663) of minimizing thread pools and tried to set system pool to 3 in my own tests. It caused deadlock on a client node and I think it can happen not only on such small pool values. Details are following: I'm not using persistence currently (if it matters). On the client note I use ignite compute to call a job on every server node (there are 3 server nodes in the tests). Then I've found in logs: {{[10:55:21] : [Step 1/1] [2020-03-13 10:55:21,773] {grid-timeout-worker-#8} [WARN] [o.a.i.i.IgniteKernal] - Possible thread pool starvation detected (no task completed in last 30000ms, is system thread pool size large enough?) [10:55:21] : [Step 1/1] ^-- System thread pool [active=3, idle=0, qSize=9]}} I see in threaddumps that all 3 system pool workers do the same - processing of job responses: {{ "sys-#26" #605 daemon prio=5 os_prio=0 tid=0x0000000064a0a800 nid=0x1f34 waiting on condition [0x000000007b91d000] java.lang.Thread.State: WAITING (parking) at sun.misc.Unsafe.park(Native Method) at java.util.concurrent.locks.LockSupport.park(LockSupport.java:304) at org.apache.ignite.internal.util.future.GridFutureAdapter.get0(GridFutureAdapter.java:177) at org.apache.ignite.internal.util.future.GridFutureAdapter.get(GridFutureAdapter.java:140) at org.apache.ignite.internal.processors.cache.binary.CacheObjectBinaryProcessorImpl.metadata(CacheObjectBinaryProcessorImpl.java:749) at org.apache.ignite.internal.processors.cache.binary.CacheObjectBinaryProcessorImpl$1.metadata(CacheObjectBinaryProcessorImpl.java:250) at org.apache.ignite.internal.binary.BinaryContext.metadata(BinaryContext.java:1169) at org.apache.ignite.internal.binary.BinaryReaderExImpl.getOrCreateSchema(BinaryReaderExImpl.java:2005) at org.apache.ignite.internal.binary.BinaryReaderExImpl.<init>(BinaryReaderExImpl.java:285) at org.apache.ignite.internal.binary.BinaryReaderExImpl.<init>(BinaryReaderExImpl.java:184) at org.apache.ignite.internal.binary.BinaryUtils.doReadObject(BinaryUtils.java:1797) at org.apache.ignite.internal.binary.BinaryUtils.deserializeOrUnmarshal(BinaryUtils.java:2160) at org.apache.ignite.internal.binary.BinaryUtils.doReadCollection(BinaryUtils.java:2091) at org.apache.ignite.internal.binary.BinaryReaderExImpl.deserialize0(BinaryReaderExImpl.java:1914) at org.apache.ignite.internal.binary.BinaryReaderExImpl.deserialize(BinaryReaderExImpl.java:1714) at org.apache.ignite.internal.binary.BinaryReaderExImpl.readField(BinaryReaderExImpl.java:1982) at org.apache.ignite.internal.binary.BinaryFieldAccessor$DefaultFinalClassAccessor.read0(BinaryFieldAccessor.java:702) at org.apache.ignite.internal.binary.BinaryFieldAccessor.read(BinaryFieldAccessor.java:187) at org.apache.ignite.internal.binary.BinaryClassDescriptor.read(BinaryClassDescriptor.java:887) at org.apache.ignite.internal.binary.BinaryReaderExImpl.deserialize0(BinaryReaderExImpl.java:1762) at org.apache.ignite.internal.binary.BinaryReaderExImpl.deserialize(BinaryReaderExImpl.java:1714) at org.apache.ignite.internal.binary.BinaryUtils.doReadObject(BinaryUtils.java:1797) at org.apache.ignite.internal.binary.BinaryUtils.deserializeOrUnmarshal(BinaryUtils.java:2160) at org.apache.ignite.internal.binary.BinaryUtils.doReadCollection(BinaryUtils.java:2091) at org.apache.ignite.internal.binary.BinaryReaderExImpl.deserialize0(BinaryReaderExImpl.java:1914) at org.apache.ignite.internal.binary.BinaryReaderExImpl.deserialize(BinaryReaderExImpl.java:1714) at org.apache.ignite.internal.binary.GridBinaryMarshaller.deserialize(GridBinaryMarshaller.java:306) at org.apache.ignite.internal.binary.BinaryMarshaller.unmarshal0(BinaryMarshaller.java:100) at org.apache.ignite.marshaller.AbstractNodeNameAwareMarshaller.unmarshal(AbstractNodeNameAwareMarshaller.java:80) at org.apache.ignite.internal.util.IgniteUtils.unmarshal(IgniteUtils.java:10493) at org.apache.ignite.internal.processors.task.GridTaskWorker.onResponse(GridTaskWorker.java:828) at org.apache.ignite.internal.processors.task.GridTaskProcessor.processJobExecuteResponse(GridTaskProcessor.java:1134) }} As I found analyzing this stack trace, unmarshalling a user object the first time(per type) causes Binary metadata request (despite I've registered this type in BinaryConfiguration.setTypeConfiguration) And all this futures will be completed after consequent MetadataResponseMessage will be received and processed on the client node. But MetadataResponseMessage(GridTopic.TOPIC_METADATA_REQ) is also processing in system pool. (I see that method GridIoManager#processRegularMessage routes it to the System Pool). So it causes deadlock as the Sytem Pool is already full. > Deadlock in the System Pool on Metadata processing > -------------------------------------------------- > > Key: IGNITE-12793 > URL: https://issues.apache.org/jira/browse/IGNITE-12793 > Project: Ignite > Issue Type: Bug > Affects Versions: 2.7.6 > Reporter: Sergey Kosarev > Priority: Major > Attachments: ignite-12793-threaddump.txt > > > I've recently tried to apply Ilya's idea > (https://issues.apache.org/jira/browse/IGNITE-12663) of minimizing thread > pools and tried to set system pool to 3 in my own tests. > It caused deadlock on a client node and I think it can happen not only on > such small pool values. > Details are following: > I'm not using persistence currently (if it matters). > On the client note I use ignite compute to call a job on every server node > (there are 3 server nodes in the tests). > Then I've found in logs: > {{[10:55:21] : [Step 1/1] [2020-03-13 10:55:21,773] > {grid-timeout-worker-#8} > [WARN] [o.a.i.i.IgniteKernal] - Possible thread pool starvation detected (no > task completed in last 30000ms, is system thread pool size large enough?) > [10:55:21] : [Step 1/1] ^-- System thread pool [active=3, idle=0, qSize=9]}} > I see in threaddumps that all 3 system pool workers do the same - processing > of job responses: > {{ "sys-#26" #605 daemon prio=5 os_prio=0 tid=0x0000000064a0a800 nid=0x1f34 > waiting on condition [0x000000007b91d000] > java.lang.Thread.State: WAITING (parking) > at sun.misc.Unsafe.park(Native Method) > at java.util.concurrent.locks.LockSupport.park(LockSupport.java:304) > at > org.apache.ignite.internal.util.future.GridFutureAdapter.get0(GridFutureAdapter.java:177) > at > org.apache.ignite.internal.util.future.GridFutureAdapter.get(GridFutureAdapter.java:140) > at > org.apache.ignite.internal.processors.cache.binary.CacheObjectBinaryProcessorImpl.metadata(CacheObjectBinaryProcessorImpl.java:749) > at > org.apache.ignite.internal.processors.cache.binary.CacheObjectBinaryProcessorImpl$1.metadata(CacheObjectBinaryProcessorImpl.java:250) > at > org.apache.ignite.internal.binary.BinaryContext.metadata(BinaryContext.java:1169) > at > org.apache.ignite.internal.binary.BinaryReaderExImpl.getOrCreateSchema(BinaryReaderExImpl.java:2005) > at > org.apache.ignite.internal.binary.BinaryReaderExImpl.<init>(BinaryReaderExImpl.java:285) > at > org.apache.ignite.internal.binary.BinaryReaderExImpl.<init>(BinaryReaderExImpl.java:184) > at > org.apache.ignite.internal.binary.BinaryUtils.doReadObject(BinaryUtils.java:1797) > at > org.apache.ignite.internal.binary.BinaryUtils.deserializeOrUnmarshal(BinaryUtils.java:2160) > at > org.apache.ignite.internal.binary.BinaryUtils.doReadCollection(BinaryUtils.java:2091) > at > org.apache.ignite.internal.binary.BinaryReaderExImpl.deserialize0(BinaryReaderExImpl.java:1914) > at > org.apache.ignite.internal.binary.BinaryReaderExImpl.deserialize(BinaryReaderExImpl.java:1714) > at > org.apache.ignite.internal.binary.BinaryReaderExImpl.readField(BinaryReaderExImpl.java:1982) > at > org.apache.ignite.internal.binary.BinaryFieldAccessor$DefaultFinalClassAccessor.read0(BinaryFieldAccessor.java:702) > at > org.apache.ignite.internal.binary.BinaryFieldAccessor.read(BinaryFieldAccessor.java:187) > at > org.apache.ignite.internal.binary.BinaryClassDescriptor.read(BinaryClassDescriptor.java:887) > at > org.apache.ignite.internal.binary.BinaryReaderExImpl.deserialize0(BinaryReaderExImpl.java:1762) > at > org.apache.ignite.internal.binary.BinaryReaderExImpl.deserialize(BinaryReaderExImpl.java:1714) > at > org.apache.ignite.internal.binary.BinaryUtils.doReadObject(BinaryUtils.java:1797) > at > org.apache.ignite.internal.binary.BinaryUtils.deserializeOrUnmarshal(BinaryUtils.java:2160) > at > org.apache.ignite.internal.binary.BinaryUtils.doReadCollection(BinaryUtils.java:2091) > at > org.apache.ignite.internal.binary.BinaryReaderExImpl.deserialize0(BinaryReaderExImpl.java:1914) > at > org.apache.ignite.internal.binary.BinaryReaderExImpl.deserialize(BinaryReaderExImpl.java:1714) > at > org.apache.ignite.internal.binary.GridBinaryMarshaller.deserialize(GridBinaryMarshaller.java:306) > at > org.apache.ignite.internal.binary.BinaryMarshaller.unmarshal0(BinaryMarshaller.java:100) > at > org.apache.ignite.marshaller.AbstractNodeNameAwareMarshaller.unmarshal(AbstractNodeNameAwareMarshaller.java:80) > at > org.apache.ignite.internal.util.IgniteUtils.unmarshal(IgniteUtils.java:10493) > at > org.apache.ignite.internal.processors.task.GridTaskWorker.onResponse(GridTaskWorker.java:828) > at > org.apache.ignite.internal.processors.task.GridTaskProcessor.processJobExecuteResponse(GridTaskProcessor.java:1134) > }} > As I found analyzing this stack trace, unmarshalling a user object the first > time(per type) causes Binary metadata request (despite I've registered this > type in BinaryConfiguration.setTypeConfiguration) > And all this futures will be completed after consequent > MetadataResponseMessage will be received and processed on the client node. > But MetadataResponseMessage(GridTopic.TOPIC_METADATA_REQ) is also processing > in system pool. (I see that method GridIoManager#processRegularMessage routes > it to the System Pool). > So it causes deadlock as the Sytem Pool is already full. > *Update* > **I reproduced this on master(see comments). Full threaddump is attached. > -- This message was sent by Atlassian Jira (v8.3.4#803005)