deadlock in system pool on meta update

Sergey-A Kosarev Mon, 16 Mar 2020 10:23:23 -0700

Classification: Public

Hi,
I've recently tried to apply Ilya's idea 
(https://issues.apache.org/jira/browse/IGNITE-12663) of minimizing thread pools 
and tried to set system pool to 3 in my own tests.
It caused deadlock on a client node and I think it can happen not only on such 
small pool values.


Details are following:
I'm not using persistence currently (if it matters).
On the client note I use ignite compute to  call   a job on every server node 
(there are 3 server nodes in the tests).

Then I've found in logs:

[10:55:21] :     [Step 1/1] [2020-03-13 10:55:21,773] { grid-timeout-worker-#8} 
[WARN] [o.a.i.i.IgniteKernal] - Possible thread pool starvation detected (no 
task completed in last 30000ms, is system thread pool size large enough?)
[10:55:21] :     [Step 1/1]     ^-- System thread pool [active=3, idle=0, 
qSize=9]


I see in threaddumps that all 3 system pool workers do the same - processing of 
job responses:

"sys-#26" #605 daemon prio=5 os_prio=0 tid=0x0000000064a0a800 nid=0x1f34 
waiting on condition [0x000000007b91d000]
   java.lang.Thread.State: WAITING (parking)
                at sun.misc.Unsafe.park(Native Method)
                at 
java.util.concurrent.locks.LockSupport.park(LockSupport.java:304)
                at 
org.apache.ignite.internal.util.future.GridFutureAdapter.get0(GridFutureAdapter.java:177)
                at 
org.apache.ignite.internal.util.future.GridFutureAdapter.get(GridFutureAdapter.java:140)
                at 
org.apache.ignite.internal.processors.cache.binary.CacheObjectBinaryProcessorImpl.metadata(CacheObjectBinaryProcessorImpl.java:749)
                at 
org.apache.ignite.internal.processors.cache.binary.CacheObjectBinaryProcessorImpl$1.metadata(CacheObjectBinaryProcessorImpl.java:250)
                at 
org.apache.ignite.internal.binary.BinaryContext.metadata(BinaryContext.java:1169)
                at 
org.apache.ignite.internal.binary.BinaryReaderExImpl.getOrCreateSchema(BinaryReaderExImpl.java:2005)
                at 
org.apache.ignite.internal.binary.BinaryReaderExImpl.<init>(BinaryReaderExImpl.java:285)
                at 
org.apache.ignite.internal.binary.BinaryReaderExImpl.<init>(BinaryReaderExImpl.java:184)
                at 
org.apache.ignite.internal.binary.BinaryUtils.doReadObject(BinaryUtils.java:1797)
                at 
org.apache.ignite.internal.binary.BinaryUtils.deserializeOrUnmarshal(BinaryUtils.java:2160)
                at 
org.apache.ignite.internal.binary.BinaryUtils.doReadCollection(BinaryUtils.java:2091)
                at 
org.apache.ignite.internal.binary.BinaryReaderExImpl.deserialize0(BinaryReaderExImpl.java:1914)
                at 
org.apache.ignite.internal.binary.BinaryReaderExImpl.deserialize(BinaryReaderExImpl.java:1714)
                at 
org.apache.ignite.internal.binary.BinaryReaderExImpl.readField(BinaryReaderExImpl.java:1982)
                at 
org.apache.ignite.internal.binary.BinaryFieldAccessor$DefaultFinalClassAccessor.read0(BinaryFieldAccessor.java:702)
                at 
org.apache.ignite.internal.binary.BinaryFieldAccessor.read(BinaryFieldAccessor.java:187)
                at 
org.apache.ignite.internal.binary.BinaryClassDescriptor.read(BinaryClassDescriptor.java:887)
                at 
org.apache.ignite.internal.binary.BinaryReaderExImpl.deserialize0(BinaryReaderExImpl.java:1762)
                at 
org.apache.ignite.internal.binary.BinaryReaderExImpl.deserialize(BinaryReaderExImpl.java:1714)
                at 
org.apache.ignite.internal.binary.BinaryUtils.doReadObject(BinaryUtils.java:1797)
                at 
org.apache.ignite.internal.binary.BinaryUtils.deserializeOrUnmarshal(BinaryUtils.java:2160)
                at 
org.apache.ignite.internal.binary.BinaryUtils.doReadCollection(BinaryUtils.java:2091)
                at 
org.apache.ignite.internal.binary.BinaryReaderExImpl.deserialize0(BinaryReaderExImpl.java:1914)
                at 
org.apache.ignite.internal.binary.BinaryReaderExImpl.deserialize(BinaryReaderExImpl.java:1714)
                at 
org.apache.ignite.internal.binary.GridBinaryMarshaller.deserialize(GridBinaryMarshaller.java:306)
                at 
org.apache.ignite.internal.binary.BinaryMarshaller.unmarshal0(BinaryMarshaller.java:100)
                at 
org.apache.ignite.marshaller.AbstractNodeNameAwareMarshaller.unmarshal(AbstractNodeNameAwareMarshaller.java:80)
                at 
org.apache.ignite.internal.util.IgniteUtils.unmarshal(IgniteUtils.java:10493)
                at 
org.apache.ignite.internal.processors.task.GridTaskWorker.onResponse(GridTaskWorker.java:828)
                at 
org.apache.ignite.internal.processors.task.GridTaskProcessor.processJobExecuteResponse(GridTaskProcessor.java:1134)


As I found analyzing this stack trace, unmarshalling a user object  the first 
time(per type) causes Binary metadata request (despite I've registered this 
type in BinaryConfiguration.setTypeConfiguration)



And all this futures will be completed after consequent MetadataResponseMessage 
will be received and processed on the client node.



But MetadataResponseMessage(GridTopic.TOPIC_METADATA_REQ) is also processing in 
system pool.

(I see that method GridIoManager#processRegularMessage routes it to the System 
Pool)
 So it causes deadlock as the Sytem Pool is already full.

Will appreciate any feedback on the topic.
Am I just shooting to my foot or should I create a ticket for this?
Is it correct that both job responses and metadata responses are processing in 
the System Pool, and if yes then how this case should be handled successfully?
I suppose on bigger pool values there is still a chance to get into such a 
situation, if many concurrent jobs are invoked from the same client node, or is 
there some golden rule to follow - something like not  than (capacity of the 
system pool - 1) concurrent jobs from the same node should be invoked?

Kind regards,
Sergey Kosarev




---
This e-mail may contain confidential and/or privileged information. If you are 
not the intended recipient (or have received this e-mail in error) please 
notify the sender immediately and delete this e-mail. Any unauthorized copying, 
disclosure or distribution of the material in this e-mail is strictly forbidden.

Please refer to https://www.db.com/disclosures for additional EU corporate and 
regulatory disclosures and to 
http://www.db.com/unitedkingdom/content/privacy.htm for information about 
privacy.

deadlock in system pool on meta update

Reply via email to