Classification: Public
Hi,
I've recently tried to apply Ilya's idea
(https://issues.apache.org/jira/browse/IGNITE-12663) of minimizing thread pools
and tried to set system pool to 3 in my own tests.
It caused deadlock on a client node and I think it can happen not only on such
small pool values.
Details are following:
I'm not using persistence currently (if it matters).
On the client note I use ignite compute to call a job on every server node
(there are 3 server nodes in the tests).
Then I've found in logs:
[10:55:21] : [Step 1/1] [2020-03-13 10:55:21,773] { grid-timeout-worker-#8}
[WARN] [o.a.i.i.IgniteKernal] - Possible thread pool starvation detected (no
task completed in last 30000ms, is system thread pool size large enough?)
[10:55:21] : [Step 1/1] ^-- System thread pool [active=3, idle=0,
qSize=9]
I see in threaddumps that all 3 system pool workers do the same - processing of
job responses:
"sys-#26" #605 daemon prio=5 os_prio=0 tid=0x0000000064a0a800 nid=0x1f34
waiting on condition [0x000000007b91d000]
java.lang.Thread.State: WAITING (parking)
at sun.misc.Unsafe.park(Native Method)
at
java.util.concurrent.locks.LockSupport.park(LockSupport.java:304)
at
org.apache.ignite.internal.util.future.GridFutureAdapter.get0(GridFutureAdapter.java:177)
at
org.apache.ignite.internal.util.future.GridFutureAdapter.get(GridFutureAdapter.java:140)
at
org.apache.ignite.internal.processors.cache.binary.CacheObjectBinaryProcessorImpl.metadata(CacheObjectBinaryProcessorImpl.java:749)
at
org.apache.ignite.internal.processors.cache.binary.CacheObjectBinaryProcessorImpl$1.metadata(CacheObjectBinaryProcessorImpl.java:250)
at
org.apache.ignite.internal.binary.BinaryContext.metadata(BinaryContext.java:1169)
at
org.apache.ignite.internal.binary.BinaryReaderExImpl.getOrCreateSchema(BinaryReaderExImpl.java:2005)
at
org.apache.ignite.internal.binary.BinaryReaderExImpl.<init>(BinaryReaderExImpl.java:285)
at
org.apache.ignite.internal.binary.BinaryReaderExImpl.<init>(BinaryReaderExImpl.java:184)
at
org.apache.ignite.internal.binary.BinaryUtils.doReadObject(BinaryUtils.java:1797)
at
org.apache.ignite.internal.binary.BinaryUtils.deserializeOrUnmarshal(BinaryUtils.java:2160)
at
org.apache.ignite.internal.binary.BinaryUtils.doReadCollection(BinaryUtils.java:2091)
at
org.apache.ignite.internal.binary.BinaryReaderExImpl.deserialize0(BinaryReaderExImpl.java:1914)
at
org.apache.ignite.internal.binary.BinaryReaderExImpl.deserialize(BinaryReaderExImpl.java:1714)
at
org.apache.ignite.internal.binary.BinaryReaderExImpl.readField(BinaryReaderExImpl.java:1982)
at
org.apache.ignite.internal.binary.BinaryFieldAccessor$DefaultFinalClassAccessor.read0(BinaryFieldAccessor.java:702)
at
org.apache.ignite.internal.binary.BinaryFieldAccessor.read(BinaryFieldAccessor.java:187)
at
org.apache.ignite.internal.binary.BinaryClassDescriptor.read(BinaryClassDescriptor.java:887)
at
org.apache.ignite.internal.binary.BinaryReaderExImpl.deserialize0(BinaryReaderExImpl.java:1762)
at
org.apache.ignite.internal.binary.BinaryReaderExImpl.deserialize(BinaryReaderExImpl.java:1714)
at
org.apache.ignite.internal.binary.BinaryUtils.doReadObject(BinaryUtils.java:1797)
at
org.apache.ignite.internal.binary.BinaryUtils.deserializeOrUnmarshal(BinaryUtils.java:2160)
at
org.apache.ignite.internal.binary.BinaryUtils.doReadCollection(BinaryUtils.java:2091)
at
org.apache.ignite.internal.binary.BinaryReaderExImpl.deserialize0(BinaryReaderExImpl.java:1914)
at
org.apache.ignite.internal.binary.BinaryReaderExImpl.deserialize(BinaryReaderExImpl.java:1714)
at
org.apache.ignite.internal.binary.GridBinaryMarshaller.deserialize(GridBinaryMarshaller.java:306)
at
org.apache.ignite.internal.binary.BinaryMarshaller.unmarshal0(BinaryMarshaller.java:100)
at
org.apache.ignite.marshaller.AbstractNodeNameAwareMarshaller.unmarshal(AbstractNodeNameAwareMarshaller.java:80)
at
org.apache.ignite.internal.util.IgniteUtils.unmarshal(IgniteUtils.java:10493)
at
org.apache.ignite.internal.processors.task.GridTaskWorker.onResponse(GridTaskWorker.java:828)
at
org.apache.ignite.internal.processors.task.GridTaskProcessor.processJobExecuteResponse(GridTaskProcessor.java:1134)
As I found analyzing this stack trace, unmarshalling a user object the first
time(per type) causes Binary metadata request (despite I've registered this
type in BinaryConfiguration.setTypeConfiguration)
And all this futures will be completed after consequent MetadataResponseMessage
will be received and processed on the client node.
But MetadataResponseMessage(GridTopic.TOPIC_METADATA_REQ) is also processing in
system pool.
(I see that method GridIoManager#processRegularMessage routes it to the System
Pool)
So it causes deadlock as the Sytem Pool is already full.
Will appreciate any feedback on the topic.
Am I just shooting to my foot or should I create a ticket for this?
Is it correct that both job responses and metadata responses are processing in
the System Pool, and if yes then how this case should be handled successfully?
I suppose on bigger pool values there is still a chance to get into such a
situation, if many concurrent jobs are invoked from the same client node, or is
there some golden rule to follow - something like not than (capacity of the
system pool - 1) concurrent jobs from the same node should be invoked?
Kind regards,
Sergey Kosarev
---
This e-mail may contain confidential and/or privileged information. If you are
not the intended recipient (or have received this e-mail in error) please
notify the sender immediately and delete this e-mail. Any unauthorized copying,
disclosure or distribution of the material in this e-mail is strictly forbidden.
Please refer to https://www.db.com/disclosures for additional EU corporate and
regulatory disclosures and to
http://www.db.com/unitedkingdom/content/privacy.htm for information about
privacy.