[
https://issues.apache.org/jira/browse/IGNITE-12793?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Sergey Antonov updated IGNITE-12793:
------------------------------------
Description:
I've recently tried to apply Ilya's idea
(https://issues.apache.org/jira/browse/IGNITE-12663) of minimizing thread pools
and tried to set system pool to 3 in my own tests.
It caused deadlock on a client node and I think it can happen not only on such
small pool values.
Details are following:
I'm not using persistence currently (if it matters).
On the client note I use ignite compute to call a job on every server node
(there are 3 server nodes in the tests).
Then I've found in logs:
{noformat}
[10:55:21] : [Step 1/1] [2020-03-13 10:55:21,773]
grid-timeout-worker-#8
[WARN] [o.a.i.i.IgniteKernal] - Possible thread pool starvation detected (no
task completed in last 30000ms, is system thread pool size large enough?)
[10:55:21] : [Step 1/1] ^-- System thread pool [active=3, idle=0, qSize=9]
{noformat}
I see in threaddumps that all 3 system pool workers do the same - processing of
job responses:
{noformat}
"sys-#26" #605 daemon prio=5 os_prio=0 tid=0x0000000064a0a800 nid=0x1f34
waiting on condition [0x000000007b91d000]
java.lang.Thread.State: WAITING (parking)
at sun.misc.Unsafe.park(Native Method)
at java.util.concurrent.locks.LockSupport.park(LockSupport.java:304)
at
org.apache.ignite.internal.util.future.GridFutureAdapter.get0(GridFutureAdapter.java:177)
at
org.apache.ignite.internal.util.future.GridFutureAdapter.get(GridFutureAdapter.java:140)
at
org.apache.ignite.internal.processors.cache.binary.CacheObjectBinaryProcessorImpl.metadata(CacheObjectBinaryProcessorImpl.java:749)
at
org.apache.ignite.internal.processors.cache.binary.CacheObjectBinaryProcessorImpl$1.metadata(CacheObjectBinaryProcessorImpl.java:250)
at
org.apache.ignite.internal.binary.BinaryContext.metadata(BinaryContext.java:1169)
at
org.apache.ignite.internal.binary.BinaryReaderExImpl.getOrCreateSchema(BinaryReaderExImpl.java:2005)
at
org.apache.ignite.internal.binary.BinaryReaderExImpl.<init>(BinaryReaderExImpl.java:285)
at
org.apache.ignite.internal.binary.BinaryReaderExImpl.<init>(BinaryReaderExImpl.java:184)
at
org.apache.ignite.internal.binary.BinaryUtils.doReadObject(BinaryUtils.java:1797)
at
org.apache.ignite.internal.binary.BinaryUtils.deserializeOrUnmarshal(BinaryUtils.java:2160)
at
org.apache.ignite.internal.binary.BinaryUtils.doReadCollection(BinaryUtils.java:2091)
at
org.apache.ignite.internal.binary.BinaryReaderExImpl.deserialize0(BinaryReaderExImpl.java:1914)
at
org.apache.ignite.internal.binary.BinaryReaderExImpl.deserialize(BinaryReaderExImpl.java:1714)
at
org.apache.ignite.internal.binary.BinaryReaderExImpl.readField(BinaryReaderExImpl.java:1982)
at
org.apache.ignite.internal.binary.BinaryFieldAccessor$DefaultFinalClassAccessor.read0(BinaryFieldAccessor.java:702)
at
org.apache.ignite.internal.binary.BinaryFieldAccessor.read(BinaryFieldAccessor.java:187)
at
org.apache.ignite.internal.binary.BinaryClassDescriptor.read(BinaryClassDescriptor.java:887)
at
org.apache.ignite.internal.binary.BinaryReaderExImpl.deserialize0(BinaryReaderExImpl.java:1762)
at
org.apache.ignite.internal.binary.BinaryReaderExImpl.deserialize(BinaryReaderExImpl.java:1714)
at
org.apache.ignite.internal.binary.BinaryUtils.doReadObject(BinaryUtils.java:1797)
at
org.apache.ignite.internal.binary.BinaryUtils.deserializeOrUnmarshal(BinaryUtils.java:2160)
at
org.apache.ignite.internal.binary.BinaryUtils.doReadCollection(BinaryUtils.java:2091)
at
org.apache.ignite.internal.binary.BinaryReaderExImpl.deserialize0(BinaryReaderExImpl.java:1914)
at
org.apache.ignite.internal.binary.BinaryReaderExImpl.deserialize(BinaryReaderExImpl.java:1714)
at
org.apache.ignite.internal.binary.GridBinaryMarshaller.deserialize(GridBinaryMarshaller.java:306)
at
org.apache.ignite.internal.binary.BinaryMarshaller.unmarshal0(BinaryMarshaller.java:100)
at
org.apache.ignite.marshaller.AbstractNodeNameAwareMarshaller.unmarshal(AbstractNodeNameAwareMarshaller.java:80)
at
org.apache.ignite.internal.util.IgniteUtils.unmarshal(IgniteUtils.java:10493)
at
org.apache.ignite.internal.processors.task.GridTaskWorker.onResponse(GridTaskWorker.java:828)
at
org.apache.ignite.internal.processors.task.GridTaskProcessor.processJobExecuteResponse(GridTaskProcessor.java:1134)
{noformat}
As I found analyzing this stack trace, unmarshalling a user object the first
time(per type) causes Binary metadata request (despite I've registered this
type in BinaryConfiguration.setTypeConfiguration)
And all this futures will be completed after consequent MetadataResponseMessage
will be received and processed on the client node. But
MetadataResponseMessage(GridTopic.TOPIC_METADATA_REQ) is also processing in
system pool. (I see that method GridIoManager#processRegularMessage routes it
to the System Pool).
So it causes deadlock as the Sytem Pool is already full.
*Update*
I reproduced this on master on standard system pool size(see comments). Full
threaddump is attached.
was:
I've recently tried to apply Ilya's idea
(https://issues.apache.org/jira/browse/IGNITE-12663) of minimizing thread pools
and tried to set system pool to 3 in my own tests.
It caused deadlock on a client node and I think it can happen not only on such
small pool values.
Details are following:
I'm not using persistence currently (if it matters).
On the client note I use ignite compute to call a job on every server node
(there are 3 server nodes in the tests).
Then I've found in logs:
{{[10:55:21] : [Step 1/1] [2020-03-13 10:55:21,773]
{grid-timeout-worker-#8}
[WARN] [o.a.i.i.IgniteKernal] - Possible thread pool starvation detected (no
task completed in last 30000ms, is system thread pool size large enough?)
[10:55:21] : [Step 1/1] ^-- System thread pool [active=3, idle=0, qSize=9]}}
I see in threaddumps that all 3 system pool workers do the same - processing of
job responses:
{{ "sys-#26" #605 daemon prio=5 os_prio=0 tid=0x0000000064a0a800 nid=0x1f34
waiting on condition [0x000000007b91d000]
java.lang.Thread.State: WAITING (parking)
at sun.misc.Unsafe.park(Native Method)
at java.util.concurrent.locks.LockSupport.park(LockSupport.java:304)
at
org.apache.ignite.internal.util.future.GridFutureAdapter.get0(GridFutureAdapter.java:177)
at
org.apache.ignite.internal.util.future.GridFutureAdapter.get(GridFutureAdapter.java:140)
at
org.apache.ignite.internal.processors.cache.binary.CacheObjectBinaryProcessorImpl.metadata(CacheObjectBinaryProcessorImpl.java:749)
at
org.apache.ignite.internal.processors.cache.binary.CacheObjectBinaryProcessorImpl$1.metadata(CacheObjectBinaryProcessorImpl.java:250)
at
org.apache.ignite.internal.binary.BinaryContext.metadata(BinaryContext.java:1169)
at
org.apache.ignite.internal.binary.BinaryReaderExImpl.getOrCreateSchema(BinaryReaderExImpl.java:2005)
at
org.apache.ignite.internal.binary.BinaryReaderExImpl.<init>(BinaryReaderExImpl.java:285)
at
org.apache.ignite.internal.binary.BinaryReaderExImpl.<init>(BinaryReaderExImpl.java:184)
at
org.apache.ignite.internal.binary.BinaryUtils.doReadObject(BinaryUtils.java:1797)
at
org.apache.ignite.internal.binary.BinaryUtils.deserializeOrUnmarshal(BinaryUtils.java:2160)
at
org.apache.ignite.internal.binary.BinaryUtils.doReadCollection(BinaryUtils.java:2091)
at
org.apache.ignite.internal.binary.BinaryReaderExImpl.deserialize0(BinaryReaderExImpl.java:1914)
at
org.apache.ignite.internal.binary.BinaryReaderExImpl.deserialize(BinaryReaderExImpl.java:1714)
at
org.apache.ignite.internal.binary.BinaryReaderExImpl.readField(BinaryReaderExImpl.java:1982)
at
org.apache.ignite.internal.binary.BinaryFieldAccessor$DefaultFinalClassAccessor.read0(BinaryFieldAccessor.java:702)
at
org.apache.ignite.internal.binary.BinaryFieldAccessor.read(BinaryFieldAccessor.java:187)
at
org.apache.ignite.internal.binary.BinaryClassDescriptor.read(BinaryClassDescriptor.java:887)
at
org.apache.ignite.internal.binary.BinaryReaderExImpl.deserialize0(BinaryReaderExImpl.java:1762)
at
org.apache.ignite.internal.binary.BinaryReaderExImpl.deserialize(BinaryReaderExImpl.java:1714)
at
org.apache.ignite.internal.binary.BinaryUtils.doReadObject(BinaryUtils.java:1797)
at
org.apache.ignite.internal.binary.BinaryUtils.deserializeOrUnmarshal(BinaryUtils.java:2160)
at
org.apache.ignite.internal.binary.BinaryUtils.doReadCollection(BinaryUtils.java:2091)
at
org.apache.ignite.internal.binary.BinaryReaderExImpl.deserialize0(BinaryReaderExImpl.java:1914)
at
org.apache.ignite.internal.binary.BinaryReaderExImpl.deserialize(BinaryReaderExImpl.java:1714)
at
org.apache.ignite.internal.binary.GridBinaryMarshaller.deserialize(GridBinaryMarshaller.java:306)
at
org.apache.ignite.internal.binary.BinaryMarshaller.unmarshal0(BinaryMarshaller.java:100)
at
org.apache.ignite.marshaller.AbstractNodeNameAwareMarshaller.unmarshal(AbstractNodeNameAwareMarshaller.java:80)
at
org.apache.ignite.internal.util.IgniteUtils.unmarshal(IgniteUtils.java:10493)
at
org.apache.ignite.internal.processors.task.GridTaskWorker.onResponse(GridTaskWorker.java:828)
at
org.apache.ignite.internal.processors.task.GridTaskProcessor.processJobExecuteResponse(GridTaskProcessor.java:1134)
}}
As I found analyzing this stack trace, unmarshalling a user object the first
time(per type) causes Binary metadata request (despite I've registered this
type in BinaryConfiguration.setTypeConfiguration)
And all this futures will be completed after consequent MetadataResponseMessage
will be received and processed on the client node. But
MetadataResponseMessage(GridTopic.TOPIC_METADATA_REQ) is also processing in
system pool. (I see that method GridIoManager#processRegularMessage routes it
to the System Pool).
So it causes deadlock as the Sytem Pool is already full.
*Update*
I reproduced this on master on standard system pool size(see comments). Full
threaddump is attached.
> Deadlock in the System Pool on Metadata processing
> --------------------------------------------------
>
> Key: IGNITE-12793
> URL: https://issues.apache.org/jira/browse/IGNITE-12793
> Project: Ignite
> Issue Type: Bug
> Affects Versions: 2.8, 2.7.6
> Reporter: Sergey Kosarev
> Assignee: Sergey Chugunov
> Priority: Major
> Fix For: 2.9
>
> Attachments: ignite-12793-threaddump.txt
>
>
> I've recently tried to apply Ilya's idea
> (https://issues.apache.org/jira/browse/IGNITE-12663) of minimizing thread
> pools and tried to set system pool to 3 in my own tests.
> It caused deadlock on a client node and I think it can happen not only on
> such small pool values.
> Details are following:
> I'm not using persistence currently (if it matters).
> On the client note I use ignite compute to call a job on every server node
> (there are 3 server nodes in the tests).
> Then I've found in logs:
> {noformat}
> [10:55:21] : [Step 1/1] [2020-03-13 10:55:21,773]
> grid-timeout-worker-#8
> [WARN] [o.a.i.i.IgniteKernal] - Possible thread pool starvation detected (no
> task completed in last 30000ms, is system thread pool size large enough?)
> [10:55:21] : [Step 1/1] ^-- System thread pool [active=3, idle=0, qSize=9]
> {noformat}
> I see in threaddumps that all 3 system pool workers do the same - processing
> of job responses:
> {noformat}
> "sys-#26" #605 daemon prio=5 os_prio=0 tid=0x0000000064a0a800 nid=0x1f34
> waiting on condition [0x000000007b91d000]
> java.lang.Thread.State: WAITING (parking)
> at sun.misc.Unsafe.park(Native Method)
> at java.util.concurrent.locks.LockSupport.park(LockSupport.java:304)
> at
> org.apache.ignite.internal.util.future.GridFutureAdapter.get0(GridFutureAdapter.java:177)
> at
> org.apache.ignite.internal.util.future.GridFutureAdapter.get(GridFutureAdapter.java:140)
> at
> org.apache.ignite.internal.processors.cache.binary.CacheObjectBinaryProcessorImpl.metadata(CacheObjectBinaryProcessorImpl.java:749)
> at
> org.apache.ignite.internal.processors.cache.binary.CacheObjectBinaryProcessorImpl$1.metadata(CacheObjectBinaryProcessorImpl.java:250)
> at
> org.apache.ignite.internal.binary.BinaryContext.metadata(BinaryContext.java:1169)
> at
> org.apache.ignite.internal.binary.BinaryReaderExImpl.getOrCreateSchema(BinaryReaderExImpl.java:2005)
> at
> org.apache.ignite.internal.binary.BinaryReaderExImpl.<init>(BinaryReaderExImpl.java:285)
> at
> org.apache.ignite.internal.binary.BinaryReaderExImpl.<init>(BinaryReaderExImpl.java:184)
> at
> org.apache.ignite.internal.binary.BinaryUtils.doReadObject(BinaryUtils.java:1797)
> at
> org.apache.ignite.internal.binary.BinaryUtils.deserializeOrUnmarshal(BinaryUtils.java:2160)
> at
> org.apache.ignite.internal.binary.BinaryUtils.doReadCollection(BinaryUtils.java:2091)
> at
> org.apache.ignite.internal.binary.BinaryReaderExImpl.deserialize0(BinaryReaderExImpl.java:1914)
> at
> org.apache.ignite.internal.binary.BinaryReaderExImpl.deserialize(BinaryReaderExImpl.java:1714)
> at
> org.apache.ignite.internal.binary.BinaryReaderExImpl.readField(BinaryReaderExImpl.java:1982)
> at
> org.apache.ignite.internal.binary.BinaryFieldAccessor$DefaultFinalClassAccessor.read0(BinaryFieldAccessor.java:702)
> at
> org.apache.ignite.internal.binary.BinaryFieldAccessor.read(BinaryFieldAccessor.java:187)
> at
> org.apache.ignite.internal.binary.BinaryClassDescriptor.read(BinaryClassDescriptor.java:887)
> at
> org.apache.ignite.internal.binary.BinaryReaderExImpl.deserialize0(BinaryReaderExImpl.java:1762)
> at
> org.apache.ignite.internal.binary.BinaryReaderExImpl.deserialize(BinaryReaderExImpl.java:1714)
> at
> org.apache.ignite.internal.binary.BinaryUtils.doReadObject(BinaryUtils.java:1797)
> at
> org.apache.ignite.internal.binary.BinaryUtils.deserializeOrUnmarshal(BinaryUtils.java:2160)
> at
> org.apache.ignite.internal.binary.BinaryUtils.doReadCollection(BinaryUtils.java:2091)
> at
> org.apache.ignite.internal.binary.BinaryReaderExImpl.deserialize0(BinaryReaderExImpl.java:1914)
> at
> org.apache.ignite.internal.binary.BinaryReaderExImpl.deserialize(BinaryReaderExImpl.java:1714)
> at
> org.apache.ignite.internal.binary.GridBinaryMarshaller.deserialize(GridBinaryMarshaller.java:306)
> at
> org.apache.ignite.internal.binary.BinaryMarshaller.unmarshal0(BinaryMarshaller.java:100)
> at
> org.apache.ignite.marshaller.AbstractNodeNameAwareMarshaller.unmarshal(AbstractNodeNameAwareMarshaller.java:80)
> at
> org.apache.ignite.internal.util.IgniteUtils.unmarshal(IgniteUtils.java:10493)
> at
> org.apache.ignite.internal.processors.task.GridTaskWorker.onResponse(GridTaskWorker.java:828)
> at
> org.apache.ignite.internal.processors.task.GridTaskProcessor.processJobExecuteResponse(GridTaskProcessor.java:1134)
>
> {noformat}
> As I found analyzing this stack trace, unmarshalling a user object the first
> time(per type) causes Binary metadata request (despite I've registered this
> type in BinaryConfiguration.setTypeConfiguration)
> And all this futures will be completed after consequent
> MetadataResponseMessage will be received and processed on the client node.
> But MetadataResponseMessage(GridTopic.TOPIC_METADATA_REQ) is also processing
> in system pool. (I see that method GridIoManager#processRegularMessage routes
> it to the System Pool).
> So it causes deadlock as the Sytem Pool is already full.
> *Update*
> I reproduced this on master on standard system pool size(see comments). Full
> threaddump is attached.
>
--
This message was sent by Atlassian Jira
(v8.3.4#803005)