Re: deadlock in system pool on meta update

Sergey Wed, 01 Apr 2020 03:40:38 -0700

Hi,
Here is reproducer for master :

https://github.com/macrergate/ignite/commit/9a7d2d27af30018a5f6faccb39176a35243ccfa2



Best regards,
Sergey Kosarev.


вт, 17 мар. 2020 г. в 12:27, Sergey-A Kosarev <sergey-a.kosa...@db.com>:

> Classification: Public
>
> Hi Sergey,
> Ticket is here https://issues.apache.org/jira/browse/IGNITE-12793
>
> I will try to make reproducer in the coming days.
>
> Kind regards,
> Sergey Kosarev
>
> -----Original Message-----
> From: Sergey Chugunov [mailto:sergey.chugu...@gmail.com]
> Sent: 17 March 2020 09:45
> To: dev <dev@ignite.apache.org>
> Subject: Re: deadlock in system pool on meta update
>
> Hello Sergey,
>
> Your analysis looks valid to me, we definitely need to investigate this
> deadlock and find out how to fix it.
>
> Could you create a ticket and write a test that reproduces the issue with
> sufficient probability?
>
> Thanks!
>
> On Mon, Mar 16, 2020 at 8:22 PM Sergey-A Kosarev <sergey-a.kosa...@db.com>
> wrote:
>
> > Classification: Public
> >
> > Hi,
> > I've recently tried to apply Ilya's idea (
> > https://issues.apache.org/jira/browse/IGNITE-12663) of minimizing
> > thread pools and tried to set system pool to 3 in my own tests.
> > It caused deadlock on a client node and I think it can happen not only
> > on such small pool values.
> >
> > Details are following:
> > I'm not using persistence currently (if it matters).
> > On the client note I use ignite compute to  call   a job on every server
> > node (there are 3 server nodes in the tests).
> >
> > Then I've found in logs:
> >
> > [10:55:21] :     [Step 1/1] [2020-03-13 10:55:21,773] {
> > grid-timeout-worker-#8} [WARN] [o.a.i.i.IgniteKernal] - Possible
> > thread pool starvation detected (no task completed in last 30000ms, is
> > system thread pool size large enough?)
> > [10:55:21] :     [Step 1/1]     ^-- System thread pool [active=3, idle=0,
> > qSize=9]
> >
> >
> > I see in threaddumps that all 3 system pool workers do the same -
> > processing of job responses:
> >
> > "sys-#26" #605 daemon prio=5 os_prio=0 tid=0x0000000064a0a800
> > nid=0x1f34 waiting on condition [0x000000007b91d000]
> >    java.lang.Thread.State: WAITING (parking)
> >                 at sun.misc.Unsafe.park(Native Method)
> >                 at
> > java.util.concurrent.locks.LockSupport.park(LockSupport.java:304)
> >                 at
> >
> org.apache.ignite.internal.util.future.GridFutureAdapter.get0(GridFutureAdapter.java:177)
> >                 at
> >
> org.apache.ignite.internal.util.future.GridFutureAdapter.get(GridFutureAdapter.java:140)
> >                 at
> >
> org.apache.ignite.internal.processors.cache.binary.CacheObjectBinaryProcessorImpl.metadata(CacheObjectBinaryProcessorImpl.java:749)
> >                 at
> >
> org.apache.ignite.internal.processors.cache.binary.CacheObjectBinaryProcessorImpl$1.metadata(CacheObjectBinaryProcessorImpl.java:250)
> >                 at
> >
> org.apache.ignite.internal.binary.BinaryContext.metadata(BinaryContext.java:1169)
> >                 at
> >
> org.apache.ignite.internal.binary.BinaryReaderExImpl.getOrCreateSchema(BinaryReaderExImpl.java:2005)
> >                 at
> >
> org.apache.ignite.internal.binary.BinaryReaderExImpl.<init>(BinaryReaderExImpl.java:285)
> >                 at
> >
> org.apache.ignite.internal.binary.BinaryReaderExImpl.<init>(BinaryReaderExImpl.java:184)
> >                 at
> >
> org.apache.ignite.internal.binary.BinaryUtils.doReadObject(BinaryUtils.java:1797)
> >                 at
> >
> org.apache.ignite.internal.binary.BinaryUtils.deserializeOrUnmarshal(BinaryUtils.java:2160)
> >                 at
> >
> org.apache.ignite.internal.binary.BinaryUtils.doReadCollection(BinaryUtils.java:2091)
> >                 at
> >
> org.apache.ignite.internal.binary.BinaryReaderExImpl.deserialize0(BinaryReaderExImpl.java:1914)
> >                 at
> >
> org.apache.ignite.internal.binary.BinaryReaderExImpl.deserialize(BinaryReaderExImpl.java:1714)
> >                 at
> >
> org.apache.ignite.internal.binary.BinaryReaderExImpl.readField(BinaryReaderExImpl.java:1982)
> >                 at
> >
> org.apache.ignite.internal.binary.BinaryFieldAccessor$DefaultFinalClassAccessor.read0(BinaryFieldAccessor.java:702)
> >                 at
> >
> org.apache.ignite.internal.binary.BinaryFieldAccessor.read(BinaryFieldAccessor.java:187)
> >                 at
> >
> org.apache.ignite.internal.binary.BinaryClassDescriptor.read(BinaryClassDescriptor.java:887)
> >                 at
> >
> org.apache.ignite.internal.binary.BinaryReaderExImpl.deserialize0(BinaryReaderExImpl.java:1762)
> >                 at
> >
> org.apache.ignite.internal.binary.BinaryReaderExImpl.deserialize(BinaryReaderExImpl.java:1714)
> >                 at
> >
> org.apache.ignite.internal.binary.BinaryUtils.doReadObject(BinaryUtils.java:1797)
> >                 at
> >
> org.apache.ignite.internal.binary.BinaryUtils.deserializeOrUnmarshal(BinaryUtils.java:2160)
> >                 at
> >
> org.apache.ignite.internal.binary.BinaryUtils.doReadCollection(BinaryUtils.java:2091)
> >                 at
> >
> org.apache.ignite.internal.binary.BinaryReaderExImpl.deserialize0(BinaryReaderExImpl.java:1914)
> >                 at
> >
> org.apache.ignite.internal.binary.BinaryReaderExImpl.deserialize(BinaryReaderExImpl.java:1714)
> >                 at
> >
> org.apache.ignite.internal.binary.GridBinaryMarshaller.deserialize(GridBinaryMarshaller.java:306)
> >                 at
> >
> org.apache.ignite.internal.binary.BinaryMarshaller.unmarshal0(BinaryMarshaller.java:100)
> >                 at
> >
> org.apache.ignite.marshaller.AbstractNodeNameAwareMarshaller.unmarshal(AbstractNodeNameAwareMarshaller.java:80)
> >                 at
> >
> org.apache.ignite.internal.util.IgniteUtils.unmarshal(IgniteUtils.java:10493)
> >                 at
> >
> org.apache.ignite.internal.processors.task.GridTaskWorker.onResponse(GridTaskWorker.java:828)
> >                 at
> > org.apache.ignite.internal.processors.task.GridTaskProcessor.processJo
> > bExecuteResponse(GridTaskProcessor.java:1134)
> >
> >
> > As I found analyzing this stack trace, unmarshalling a user object
> > the first time(per type) causes Binary metadata request (despite I've
> > registered this type in BinaryConfiguration.setTypeConfiguration)
> >
> >
> >
> > And all this futures will be completed after consequent
> > MetadataResponseMessage will be received and processed on the client
> node.
> >
> >
> >
> > But MetadataResponseMessage(GridTopic.TOPIC_METADATA_REQ) is also
> > processing in system pool.
> >
> > (I see that method GridIoManager#processRegularMessage routes it to
> > the System Pool)  So it causes deadlock as the Sytem Pool is already
> > full.
> >
> > Will appreciate any feedback on the topic.
> > Am I just shooting to my foot or should I create a ticket for this?
> > Is it correct that both job responses and metadata responses are
> > processing in the System Pool, and if yes then how this case should be
> > handled successfully?
> > I suppose on bigger pool values there is still a chance to get into
> > such a situation, if many concurrent jobs are invoked from the same
> > client node, or is there some golden rule to follow - something like
> > not  than (capacity of the system pool - 1) concurrent jobs from the
> > same node should be invoked?
> >
> > Kind regards,
> > Sergey Kosarev
> >
> >
> >
> >
> > ---
> > This e-mail may contain confidential and/or privileged information. If
> > you are not the intended recipient (or have received this e-mail in
> > error) please notify the sender immediately and delete this e-mail.
> > Any unauthorized copying, disclosure or distribution of the material
> > in this e-mail is strictly forbidden.
> >
> > Please refer to https://www.db.com/disclosures for additional EU
> > corporate and regulatory disclosures and to
> > http://www.db.com/unitedkingdom/content/privacy.htm for information
> > about privacy.
> >
>
>
> ---
> This e-mail may contain confidential and/or privileged information. If you
> are not the intended recipient (or have received this e-mail in error)
> please notify the sender immediately and delete this e-mail. Any
> unauthorized copying, disclosure or distribution of the material in this
> e-mail is strictly forbidden.
>
> Please refer to https://www.db.com/disclosures for additional EU
> corporate and regulatory disclosures and to
> http://www.db.com/unitedkingdom/content/privacy.htm for information about
> privacy.
>

Re: deadlock in system pool on meta update

Reply via email to