[ 
https://issues.apache.org/jira/browse/IGNITE-12793?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sergey Kosarev updated IGNITE-12793:
------------------------------------
    Description: 
I've recently tried to apply Ilya's idea 
(https://issues.apache.org/jira/browse/IGNITE-12663) of minimizing thread pools 
and tried to set system pool to 3 in my own tests.
 It caused deadlock on a client node and I think it can happen not only on such 
small pool values.

Details are following:
 I'm not using persistence currently (if it matters).
 On the client note I use ignite compute to call a job on every server node 
(there are 3 server nodes in the tests).

Then I've found in logs:
 {{[10:55:21] : [Step 1/1] [2020-03-13 10:55:21,773]

{grid-timeout-worker-#8}

[WARN] [o.a.i.i.IgniteKernal] - Possible thread pool starvation detected (no 
task completed in last 30000ms, is system thread pool size large enough?)
 [10:55:21] : [Step 1/1] ^-- System thread pool [active=3, idle=0, qSize=9]}}

I see in threaddumps that all 3 system pool workers do the same - processing of 
job responses:
 {{ "sys-#26" #605 daemon prio=5 os_prio=0 tid=0x0000000064a0a800 nid=0x1f34 
waiting on condition [0x000000007b91d000]
 java.lang.Thread.State: WAITING (parking)
 at sun.misc.Unsafe.park(Native Method)
 at java.util.concurrent.locks.LockSupport.park(LockSupport.java:304)
 at 
org.apache.ignite.internal.util.future.GridFutureAdapter.get0(GridFutureAdapter.java:177)
 at 
org.apache.ignite.internal.util.future.GridFutureAdapter.get(GridFutureAdapter.java:140)
 at 
org.apache.ignite.internal.processors.cache.binary.CacheObjectBinaryProcessorImpl.metadata(CacheObjectBinaryProcessorImpl.java:749)
 at 
org.apache.ignite.internal.processors.cache.binary.CacheObjectBinaryProcessorImpl$1.metadata(CacheObjectBinaryProcessorImpl.java:250)
 at 
org.apache.ignite.internal.binary.BinaryContext.metadata(BinaryContext.java:1169)
 at 
org.apache.ignite.internal.binary.BinaryReaderExImpl.getOrCreateSchema(BinaryReaderExImpl.java:2005)
 at 
org.apache.ignite.internal.binary.BinaryReaderExImpl.<init>(BinaryReaderExImpl.java:285)
 at 
org.apache.ignite.internal.binary.BinaryReaderExImpl.<init>(BinaryReaderExImpl.java:184)
 at 
org.apache.ignite.internal.binary.BinaryUtils.doReadObject(BinaryUtils.java:1797)
 at 
org.apache.ignite.internal.binary.BinaryUtils.deserializeOrUnmarshal(BinaryUtils.java:2160)
 at 
org.apache.ignite.internal.binary.BinaryUtils.doReadCollection(BinaryUtils.java:2091)
 at 
org.apache.ignite.internal.binary.BinaryReaderExImpl.deserialize0(BinaryReaderExImpl.java:1914)
 at 
org.apache.ignite.internal.binary.BinaryReaderExImpl.deserialize(BinaryReaderExImpl.java:1714)
 at 
org.apache.ignite.internal.binary.BinaryReaderExImpl.readField(BinaryReaderExImpl.java:1982)
 at 
org.apache.ignite.internal.binary.BinaryFieldAccessor$DefaultFinalClassAccessor.read0(BinaryFieldAccessor.java:702)
 at 
org.apache.ignite.internal.binary.BinaryFieldAccessor.read(BinaryFieldAccessor.java:187)
 at 
org.apache.ignite.internal.binary.BinaryClassDescriptor.read(BinaryClassDescriptor.java:887)
 at 
org.apache.ignite.internal.binary.BinaryReaderExImpl.deserialize0(BinaryReaderExImpl.java:1762)
 at 
org.apache.ignite.internal.binary.BinaryReaderExImpl.deserialize(BinaryReaderExImpl.java:1714)
 at 
org.apache.ignite.internal.binary.BinaryUtils.doReadObject(BinaryUtils.java:1797)
 at 
org.apache.ignite.internal.binary.BinaryUtils.deserializeOrUnmarshal(BinaryUtils.java:2160)
 at 
org.apache.ignite.internal.binary.BinaryUtils.doReadCollection(BinaryUtils.java:2091)
 at 
org.apache.ignite.internal.binary.BinaryReaderExImpl.deserialize0(BinaryReaderExImpl.java:1914)
 at 
org.apache.ignite.internal.binary.BinaryReaderExImpl.deserialize(BinaryReaderExImpl.java:1714)
 at 
org.apache.ignite.internal.binary.GridBinaryMarshaller.deserialize(GridBinaryMarshaller.java:306)
 at 
org.apache.ignite.internal.binary.BinaryMarshaller.unmarshal0(BinaryMarshaller.java:100)
 at 
org.apache.ignite.marshaller.AbstractNodeNameAwareMarshaller.unmarshal(AbstractNodeNameAwareMarshaller.java:80)
 at 
org.apache.ignite.internal.util.IgniteUtils.unmarshal(IgniteUtils.java:10493)
 at 
org.apache.ignite.internal.processors.task.GridTaskWorker.onResponse(GridTaskWorker.java:828)
 at 
org.apache.ignite.internal.processors.task.GridTaskProcessor.processJobExecuteResponse(GridTaskProcessor.java:1134)
 }}

As I found analyzing this stack trace, unmarshalling a user object the first 
time(per type) causes Binary metadata request (despite I've registered this 
type in BinaryConfiguration.setTypeConfiguration)

And all this futures will be completed after consequent MetadataResponseMessage 
will be received and processed on the client node. But 
MetadataResponseMessage(GridTopic.TOPIC_METADATA_REQ) is also processing in 
system pool. (I see that method GridIoManager#processRegularMessage routes it 
to the System Pool). 
 So it causes deadlock as the Sytem Pool is already full.

*Update*

**I reproduced this on master(see comments). Full threaddump is attached.

 

  was:
I've recently tried to apply Ilya's idea 
(https://issues.apache.org/jira/browse/IGNITE-12663) of minimizing thread pools 
and tried to set system pool to 3 in my own tests.
 It caused deadlock on a client node and I think it can happen not only on such 
small pool values.

Details are following:
 I'm not using persistence currently (if it matters).
 On the client note I use ignite compute to call a job on every server node 
(there are 3 server nodes in the tests).

Then I've found in logs:
 {{[10:55:21] : [Step 1/1] [2020-03-13 10:55:21,773] {grid-timeout-worker-#8} 
[WARN] [o.a.i.i.IgniteKernal] - Possible thread pool starvation detected (no 
task completed in last 30000ms, is system thread pool size large enough?)
 [10:55:21] : [Step 1/1] ^-- System thread pool [active=3, idle=0, qSize=9]}}

I see in threaddumps that all 3 system pool workers do the same - processing of 
job responses:
 {{ "sys-#26" #605 daemon prio=5 os_prio=0 tid=0x0000000064a0a800 nid=0x1f34 
waiting on condition [0x000000007b91d000]
 java.lang.Thread.State: WAITING (parking)
 at sun.misc.Unsafe.park(Native Method)
 at java.util.concurrent.locks.LockSupport.park(LockSupport.java:304)
 at 
org.apache.ignite.internal.util.future.GridFutureAdapter.get0(GridFutureAdapter.java:177)
 at 
org.apache.ignite.internal.util.future.GridFutureAdapter.get(GridFutureAdapter.java:140)
 at 
org.apache.ignite.internal.processors.cache.binary.CacheObjectBinaryProcessorImpl.metadata(CacheObjectBinaryProcessorImpl.java:749)
 at 
org.apache.ignite.internal.processors.cache.binary.CacheObjectBinaryProcessorImpl$1.metadata(CacheObjectBinaryProcessorImpl.java:250)
 at 
org.apache.ignite.internal.binary.BinaryContext.metadata(BinaryContext.java:1169)
 at 
org.apache.ignite.internal.binary.BinaryReaderExImpl.getOrCreateSchema(BinaryReaderExImpl.java:2005)
 at 
org.apache.ignite.internal.binary.BinaryReaderExImpl.<init>(BinaryReaderExImpl.java:285)
 at 
org.apache.ignite.internal.binary.BinaryReaderExImpl.<init>(BinaryReaderExImpl.java:184)
 at 
org.apache.ignite.internal.binary.BinaryUtils.doReadObject(BinaryUtils.java:1797)
 at 
org.apache.ignite.internal.binary.BinaryUtils.deserializeOrUnmarshal(BinaryUtils.java:2160)
 at 
org.apache.ignite.internal.binary.BinaryUtils.doReadCollection(BinaryUtils.java:2091)
 at 
org.apache.ignite.internal.binary.BinaryReaderExImpl.deserialize0(BinaryReaderExImpl.java:1914)
 at 
org.apache.ignite.internal.binary.BinaryReaderExImpl.deserialize(BinaryReaderExImpl.java:1714)
 at 
org.apache.ignite.internal.binary.BinaryReaderExImpl.readField(BinaryReaderExImpl.java:1982)
 at 
org.apache.ignite.internal.binary.BinaryFieldAccessor$DefaultFinalClassAccessor.read0(BinaryFieldAccessor.java:702)
 at 
org.apache.ignite.internal.binary.BinaryFieldAccessor.read(BinaryFieldAccessor.java:187)
 at 
org.apache.ignite.internal.binary.BinaryClassDescriptor.read(BinaryClassDescriptor.java:887)
 at 
org.apache.ignite.internal.binary.BinaryReaderExImpl.deserialize0(BinaryReaderExImpl.java:1762)
 at 
org.apache.ignite.internal.binary.BinaryReaderExImpl.deserialize(BinaryReaderExImpl.java:1714)
 at 
org.apache.ignite.internal.binary.BinaryUtils.doReadObject(BinaryUtils.java:1797)
 at 
org.apache.ignite.internal.binary.BinaryUtils.deserializeOrUnmarshal(BinaryUtils.java:2160)
 at 
org.apache.ignite.internal.binary.BinaryUtils.doReadCollection(BinaryUtils.java:2091)
 at 
org.apache.ignite.internal.binary.BinaryReaderExImpl.deserialize0(BinaryReaderExImpl.java:1914)
 at 
org.apache.ignite.internal.binary.BinaryReaderExImpl.deserialize(BinaryReaderExImpl.java:1714)
 at 
org.apache.ignite.internal.binary.GridBinaryMarshaller.deserialize(GridBinaryMarshaller.java:306)
 at 
org.apache.ignite.internal.binary.BinaryMarshaller.unmarshal0(BinaryMarshaller.java:100)
 at 
org.apache.ignite.marshaller.AbstractNodeNameAwareMarshaller.unmarshal(AbstractNodeNameAwareMarshaller.java:80)
 at 
org.apache.ignite.internal.util.IgniteUtils.unmarshal(IgniteUtils.java:10493)
 at 
org.apache.ignite.internal.processors.task.GridTaskWorker.onResponse(GridTaskWorker.java:828)
 at 
org.apache.ignite.internal.processors.task.GridTaskProcessor.processJobExecuteResponse(GridTaskProcessor.java:1134)
 }}

As I found analyzing this stack trace, unmarshalling a user object the first 
time(per type) causes Binary metadata request (despite I've registered this 
type in BinaryConfiguration.setTypeConfiguration)

And all this futures will be completed after consequent MetadataResponseMessage 
will be received and processed on the client node. But 
MetadataResponseMessage(GridTopic.TOPIC_METADATA_REQ) is also processing in 
system pool. (I see that method GridIoManager#processRegularMessage routes it 
to the System Pool). 
 So it causes deadlock as the Sytem Pool is already full.


> Deadlock in the System Pool on Metadata processing
> --------------------------------------------------
>
>                 Key: IGNITE-12793
>                 URL: https://issues.apache.org/jira/browse/IGNITE-12793
>             Project: Ignite
>          Issue Type: Bug
>    Affects Versions: 2.7.6
>            Reporter: Sergey Kosarev
>            Priority: Major
>         Attachments: ignite-12793-threaddump.txt
>
>
> I've recently tried to apply Ilya's idea 
> (https://issues.apache.org/jira/browse/IGNITE-12663) of minimizing thread 
> pools and tried to set system pool to 3 in my own tests.
>  It caused deadlock on a client node and I think it can happen not only on 
> such small pool values.
> Details are following:
>  I'm not using persistence currently (if it matters).
>  On the client note I use ignite compute to call a job on every server node 
> (there are 3 server nodes in the tests).
> Then I've found in logs:
>  {{[10:55:21] : [Step 1/1] [2020-03-13 10:55:21,773]
> {grid-timeout-worker-#8}
> [WARN] [o.a.i.i.IgniteKernal] - Possible thread pool starvation detected (no 
> task completed in last 30000ms, is system thread pool size large enough?)
>  [10:55:21] : [Step 1/1] ^-- System thread pool [active=3, idle=0, qSize=9]}}
> I see in threaddumps that all 3 system pool workers do the same - processing 
> of job responses:
>  {{ "sys-#26" #605 daemon prio=5 os_prio=0 tid=0x0000000064a0a800 nid=0x1f34 
> waiting on condition [0x000000007b91d000]
>  java.lang.Thread.State: WAITING (parking)
>  at sun.misc.Unsafe.park(Native Method)
>  at java.util.concurrent.locks.LockSupport.park(LockSupport.java:304)
>  at 
> org.apache.ignite.internal.util.future.GridFutureAdapter.get0(GridFutureAdapter.java:177)
>  at 
> org.apache.ignite.internal.util.future.GridFutureAdapter.get(GridFutureAdapter.java:140)
>  at 
> org.apache.ignite.internal.processors.cache.binary.CacheObjectBinaryProcessorImpl.metadata(CacheObjectBinaryProcessorImpl.java:749)
>  at 
> org.apache.ignite.internal.processors.cache.binary.CacheObjectBinaryProcessorImpl$1.metadata(CacheObjectBinaryProcessorImpl.java:250)
>  at 
> org.apache.ignite.internal.binary.BinaryContext.metadata(BinaryContext.java:1169)
>  at 
> org.apache.ignite.internal.binary.BinaryReaderExImpl.getOrCreateSchema(BinaryReaderExImpl.java:2005)
>  at 
> org.apache.ignite.internal.binary.BinaryReaderExImpl.<init>(BinaryReaderExImpl.java:285)
>  at 
> org.apache.ignite.internal.binary.BinaryReaderExImpl.<init>(BinaryReaderExImpl.java:184)
>  at 
> org.apache.ignite.internal.binary.BinaryUtils.doReadObject(BinaryUtils.java:1797)
>  at 
> org.apache.ignite.internal.binary.BinaryUtils.deserializeOrUnmarshal(BinaryUtils.java:2160)
>  at 
> org.apache.ignite.internal.binary.BinaryUtils.doReadCollection(BinaryUtils.java:2091)
>  at 
> org.apache.ignite.internal.binary.BinaryReaderExImpl.deserialize0(BinaryReaderExImpl.java:1914)
>  at 
> org.apache.ignite.internal.binary.BinaryReaderExImpl.deserialize(BinaryReaderExImpl.java:1714)
>  at 
> org.apache.ignite.internal.binary.BinaryReaderExImpl.readField(BinaryReaderExImpl.java:1982)
>  at 
> org.apache.ignite.internal.binary.BinaryFieldAccessor$DefaultFinalClassAccessor.read0(BinaryFieldAccessor.java:702)
>  at 
> org.apache.ignite.internal.binary.BinaryFieldAccessor.read(BinaryFieldAccessor.java:187)
>  at 
> org.apache.ignite.internal.binary.BinaryClassDescriptor.read(BinaryClassDescriptor.java:887)
>  at 
> org.apache.ignite.internal.binary.BinaryReaderExImpl.deserialize0(BinaryReaderExImpl.java:1762)
>  at 
> org.apache.ignite.internal.binary.BinaryReaderExImpl.deserialize(BinaryReaderExImpl.java:1714)
>  at 
> org.apache.ignite.internal.binary.BinaryUtils.doReadObject(BinaryUtils.java:1797)
>  at 
> org.apache.ignite.internal.binary.BinaryUtils.deserializeOrUnmarshal(BinaryUtils.java:2160)
>  at 
> org.apache.ignite.internal.binary.BinaryUtils.doReadCollection(BinaryUtils.java:2091)
>  at 
> org.apache.ignite.internal.binary.BinaryReaderExImpl.deserialize0(BinaryReaderExImpl.java:1914)
>  at 
> org.apache.ignite.internal.binary.BinaryReaderExImpl.deserialize(BinaryReaderExImpl.java:1714)
>  at 
> org.apache.ignite.internal.binary.GridBinaryMarshaller.deserialize(GridBinaryMarshaller.java:306)
>  at 
> org.apache.ignite.internal.binary.BinaryMarshaller.unmarshal0(BinaryMarshaller.java:100)
>  at 
> org.apache.ignite.marshaller.AbstractNodeNameAwareMarshaller.unmarshal(AbstractNodeNameAwareMarshaller.java:80)
>  at 
> org.apache.ignite.internal.util.IgniteUtils.unmarshal(IgniteUtils.java:10493)
>  at 
> org.apache.ignite.internal.processors.task.GridTaskWorker.onResponse(GridTaskWorker.java:828)
>  at 
> org.apache.ignite.internal.processors.task.GridTaskProcessor.processJobExecuteResponse(GridTaskProcessor.java:1134)
>  }}
> As I found analyzing this stack trace, unmarshalling a user object the first 
> time(per type) causes Binary metadata request (despite I've registered this 
> type in BinaryConfiguration.setTypeConfiguration)
> And all this futures will be completed after consequent 
> MetadataResponseMessage will be received and processed on the client node. 
> But MetadataResponseMessage(GridTopic.TOPIC_METADATA_REQ) is also processing 
> in system pool. (I see that method GridIoManager#processRegularMessage routes 
> it to the System Pool). 
>  So it causes deadlock as the Sytem Pool is already full.
> *Update*
> **I reproduced this on master(see comments). Full threaddump is attached.
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

Reply via email to