[
https://issues.apache.org/jira/browse/IGNITE-12086?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
radhakrupa updated IGNITE-12086:
--------------------------------
Description:
Ignite has been deployed on the kubernets , there are 3 replicas of server pod.
The pods were up and running fine for 9 days. We have created 180 inventory
tables and 204 transactional tables. The data has been inserted using the
PyIgnite client using the cache.put() method. This is a very slow operation
because PyIgnite is very slow. Each insert is committed one at a time, so it
is not able to do bulk-style inserts. The PyIgnite was inserting about 20 of
the inventory tables simultaneously (20 different threads/processes).
The cluster was nowhere stable after 9days, one of the pod crashed and failed
to recover. Below is the error log:
{"type":"log","host":"ignite-cluster-ignite-esoc-2","level":"ERROR","system":"ignite-service","time":"2019-08-16T17:13:34,769Z","logger":"GridCachePartitionExchangeManager","timezone":"UTC","log":"Failed
to process custom exchange task: ClientCacheChangeDummyDiscoveryMessage
[reqId=6b5f6c50-a8c9-4b04-a461-49bfd0112eb0, cachesToClose=null,
startCaches=[BgwService]] java.lang.NullPointerException| at
org.apache.ignite.internal.processors.cache.CacheAffinitySharedManager.processClientCachesChanges(CacheAffinitySharedManager.java:635)|
at
org.apache.ignite.internal.processors.cache.GridCacheProcessor.processCustomExchangeTask(GridCacheProcessor.java:391)|
at
org.apache.ignite.internal.processors.cache.GridCachePartitionExchangeManager$ExchangeWorker.processCustomTask(GridCachePartitionExchangeManager.java:2475)|
at
org.apache.ignite.internal.processors.cache.GridCachePartitionExchangeManager$ExchangeWorker.body0(GridCachePartitionExchangeManager.java:2620)|
at
org.apache.ignite.internal.processors.cache.GridCachePartitionExchangeManager$ExchangeWorker.body(GridCachePartitionExchangeManager.java:2539)|
at org.apache.ignite.internal.util.worker.GridWorker.run(GridWorker.java:120)|
at java.lang.Thread.run(Thread.java:748)"}
{"type":"log","host":"ignite-cluster-ignite-esoc-2","level":"WARN","system":"ignite-service","time":"2019-08-16T17:13:36,724Z","logger":"GridCacheDatabaseSharedManager","timezone":"UTC","log":"Ignite
node stopped in the middle of checkpoint. Will restore memory state and finish
checkpoint on node start."}
The error report file and ignite-config.xml has been attached for your info.
Heap Memory and RAM Configurations are as below on each of the ignite server
container:
Heap Memory: 32gb
RAM: 64GB
Default memory region:
cpu: 4
Persistence volume
wal_storage_size: 10GB
persistence_storage_size: 10GB
was:
Ignite has been deployed on the kubernets , there are 3 replicas of server pod.
The pods were up and running fine for 9 days. We have created 180 invent
tables and 204 transactional tables. The data has been inserted using the
PyIgnite client using the cache.put() method. This is a very slow operation
because PyIgnite is very slow. Each insert is committed one at a time, so it
is not able to do bulk-style inserts. The PyIgnite was inserting about 20 of
the inventory tables simultaneously (20 different threads/processes).
The cluster was nowhere stable after 9days, one of the pod crashed and failed
to recover. Below is the error log:
{"type":"log","host":"ignite-cluster-ignite-esoc-2","level":"ERROR","system":"ignite-service","time":"2019-08-16T17:13:34,769Z","logger":"GridCachePartitionExchangeManager","timezone":"UTC","log":"Failed
to process custom exchange task: ClientCacheChangeDummyDiscoveryMessage
[reqId=6b5f6c50-a8c9-4b04-a461-49bfd0112eb0, cachesToClose=null,
startCaches=[BgwService]] java.lang.NullPointerException| at
org.apache.ignite.internal.processors.cache.CacheAffinitySharedManager.processClientCachesChanges(CacheAffinitySharedManager.java:635)|
at
org.apache.ignite.internal.processors.cache.GridCacheProcessor.processCustomExchangeTask(GridCacheProcessor.java:391)|
at
org.apache.ignite.internal.processors.cache.GridCachePartitionExchangeManager$ExchangeWorker.processCustomTask(GridCachePartitionExchangeManager.java:2475)|
at
org.apache.ignite.internal.processors.cache.GridCachePartitionExchangeManager$ExchangeWorker.body0(GridCachePartitionExchangeManager.java:2620)|
at
org.apache.ignite.internal.processors.cache.GridCachePartitionExchangeManager$ExchangeWorker.body(GridCachePartitionExchangeManager.java:2539)|
at org.apache.ignite.internal.util.worker.GridWorker.run(GridWorker.java:120)|
at java.lang.Thread.run(Thread.java:748)"}
\{"type":"log","host":"ignite-cluster-ignite-esoc-2","level":"WARN","system":"ignite-service","time":"2019-08-16T17:13:36,724Z","logger":"GridCacheDatabaseSharedManager","timezone":"UTC","log":"Ignite
node stopped in the middle of checkpoint. Will restore memory state and finish
checkpoint on node start."}
The error report file and ignite-config.xml has been attached for your info.
Heap Memory and RAM Configurations are as below on each of the ignite server
container:
Heap Memory: 32gb
RAM: 64GB
Default memory region:
cpu: 4
Persistence volume
wal_storage_size: 10GB
persistence_storage_size: 10GB
> Ignite pod keeps crashing and failed to recover the node
> ---------------------------------------------------------
>
> Key: IGNITE-12086
> URL: https://issues.apache.org/jira/browse/IGNITE-12086
> Project: Ignite
> Issue Type: Bug
> Affects Versions: 2.7
> Reporter: radhakrupa
> Priority: Major
> Attachments: hs_err_pid116.log, ignite-config.xml
>
>
> Ignite has been deployed on the kubernets , there are 3 replicas of server
> pod. The pods were up and running fine for 9 days. We have created 180
> inventory tables and 204 transactional tables. The data has been inserted
> using the PyIgnite client using the cache.put() method. This is a very slow
> operation because PyIgnite is very slow. Each insert is committed one at a
> time, so it is not able to do bulk-style inserts. The PyIgnite was inserting
> about 20 of the inventory tables simultaneously (20 different
> threads/processes).
> The cluster was nowhere stable after 9days, one of the pod crashed and failed
> to recover. Below is the error log:
> {"type":"log","host":"ignite-cluster-ignite-esoc-2","level":"ERROR","system":"ignite-service","time":"2019-08-16T17:13:34,769Z","logger":"GridCachePartitionExchangeManager","timezone":"UTC","log":"Failed
> to process custom exchange task: ClientCacheChangeDummyDiscoveryMessage
> [reqId=6b5f6c50-a8c9-4b04-a461-49bfd0112eb0, cachesToClose=null,
> startCaches=[BgwService]] java.lang.NullPointerException| at
> org.apache.ignite.internal.processors.cache.CacheAffinitySharedManager.processClientCachesChanges(CacheAffinitySharedManager.java:635)|
> at
> org.apache.ignite.internal.processors.cache.GridCacheProcessor.processCustomExchangeTask(GridCacheProcessor.java:391)|
> at
> org.apache.ignite.internal.processors.cache.GridCachePartitionExchangeManager$ExchangeWorker.processCustomTask(GridCachePartitionExchangeManager.java:2475)|
> at
> org.apache.ignite.internal.processors.cache.GridCachePartitionExchangeManager$ExchangeWorker.body0(GridCachePartitionExchangeManager.java:2620)|
> at
> org.apache.ignite.internal.processors.cache.GridCachePartitionExchangeManager$ExchangeWorker.body(GridCachePartitionExchangeManager.java:2539)|
> at
> org.apache.ignite.internal.util.worker.GridWorker.run(GridWorker.java:120)|
> at java.lang.Thread.run(Thread.java:748)"}
> {"type":"log","host":"ignite-cluster-ignite-esoc-2","level":"WARN","system":"ignite-service","time":"2019-08-16T17:13:36,724Z","logger":"GridCacheDatabaseSharedManager","timezone":"UTC","log":"Ignite
> node stopped in the middle of checkpoint. Will restore memory state and
> finish checkpoint on node start."}
> The error report file and ignite-config.xml has been attached for your info.
> Heap Memory and RAM Configurations are as below on each of the ignite server
> container:
> Heap Memory: 32gb
> RAM: 64GB
> Default memory region:
> cpu: 4
> Persistence volume
> wal_storage_size: 10GB
> persistence_storage_size: 10GB
>
--
This message was sent by Atlassian Jira
(v8.3.2#803003)