[jira] [Commented] (IMPALA-10420) Impala slow performance and crash

Quanlong Huang (Jira) Mon, 15 Mar 2021 18:18:11 -0700


    [ 
https://issues.apache.org/jira/browse/IMPALA-10420?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17302139#comment-17302139
 ]


Quanlong Huang commented on IMPALA-10420:
-----------------------------------------

[[email protected]] Thanks for reporting this issue! I have some questions:

Which catalog mode are you using? If you are using the legacy catalog mode, 
could you have a try on the local catalog mode?
 Apache Ref: 
[https://impala.apache.org/docs/build/html/topics/impala_metadata.html]
 CDH Ref: 
[https://docs.cloudera.com/best-practices/latest/impala-performance/topics/bp-impala-enable-on-demand-metadata-fetch.html]

What's the scale of your warehouse, e.g. number of tables, number of partitions 
and files of the largest table? This helps to understand the scale of the 
metadata footprint.

> Impala slow performance and crash
> ---------------------------------
>
>                 Key: IMPALA-10420
>                 URL: https://issues.apache.org/jira/browse/IMPALA-10420
>             Project: IMPALA
>          Issue Type: Bug
>            Reporter: Manikandan R
>            Priority: Major
>
> At times, Impala daemon has been performing very badly for sometime (slightly 
> for more than 1 hour) and crashed with OOM errors.
> Stack trace:
> I1229 18:58:30.675091 108919 Frontend.java:874] Waiting for local catalog to 
> be initialized, attempt: 41
> I1229 18:58:32.675457 108919 Frontend.java:874] Waiting for local catalog to 
> be initialized, attempt: 42
> 5:06
> I1229 19:24:11.218081 108919 Frontend.java:874] Waiting for local catalog to 
> be initialized, attempt: 99
> I1229 19:24:11.632340 109479 jni-util.cc:211] java.lang.OutOfMemoryError: GC 
> overhead limit exceeded
>         at java.util.Arrays.copyOfRange(Arrays.java:3664)
>         at java.lang.String.<init>(String.java:207)
>         at java.lang.StringBuilder.toString(StringBuilder.java:407)
>         at 
> org.apache.hadoop.hive.common.FileUtils.escapePathName(FileUtils.java:287)
>         at 
> org.apache.hadoop.hive.common.FileUtils.makePartName(FileUtils.java:153)
> During this 1 hour or so period, We are seeing lot of retry attempts with 
> respect to catalog initialization log messages, heap space issue, catalog 
> update etc
> I1229 18:27:21.157497 80168 status.cc:125] OutOfMemoryError: Java heap space
>   @      0x95b479 impala::Status::Status()
>   @      0xca3f22 impala::JniUtil::GetJniExceptionMsg()
>   @      0xba3be8 impala::Frontend::UpdateCatalogCache()
>   @      0xbc1589 impala::ImpalaServer::CatalogUpdateCallback()
>   @      0xc62c73 impala::StatestoreSubscriber::UpdateState()
>   @      0xc68963 impala::StatestoreSubscriberThriftIf::UpdateState()
>   @     0x10f0fc8 impala::StatestoreSubscriberProcessor::process_UpdateState()
>   @     0x10f0204 impala::StatestoreSubscriberProcessor::dispatchCall()
>   @      0x92bb4c apache::thrift::TDispatchProcessor::process()
>   @      0xafc6df apache::thrift::server::TAcceptQueueServer::Task::run()
>   @      0xaf6fd5 impala::ThriftThread::RunRunnable()
>   @      0xaf7db2 
> boost::detail::function::void_function_obj_invoker0<>::invoke()
>   @      0xd16c83 impala::Thread::SuperviseThread()
>   @      0xd173c4 boost::detail::thread_data<>::run()
>   @     0x128fada (unknown)
>   @   0x7f4328a89ea5 start_thread
>   @   0x7f43287b28dd __clone
> E1229 18:27:21.157521 80168 impala-server.cc:1454] There was an error 
> processing the impalad catalog update. Requesting a full topic update to 
> recover: OutOfMemoryError: Java heap space
> 4:16
> I1229 17:06:27.922144 93138 status.cc:125] OutOfMemoryError: GC overhead 
> limit exceeded
>   @      0x95b479 impala::Status::Status()
>   @      0xca3f22 impala::JniUtil::GetJniExceptionMsg()
>   @      0xba3be8 impala::Frontend::UpdateCatalogCache()
>   @      0xbc1589 impala::ImpalaServer::CatalogUpdateCallback()
>   @      0xc62c73 impala::StatestoreSubscriber::UpdateState()
>   @      0xc68963 impala::StatestoreSubscriberThriftIf::UpdateState()
>   @     0x10f0fc8 impala::StatestoreSubscriberProcessor::process_UpdateState()
>   @     0x10f0204 impala::StatestoreSubscriberProcessor::dispatchCall()
>   @      0x92bb4c apache::thrift::TDispatchProcessor::process()
>   @      0xafc6df apache::thrift::server::TAcceptQueueServer::Task::run()
>   @      0xaf6fd5 impala::ThriftThread::RunRunnable()
>   @      0xaf7db2 
> boost::detail::function::void_function_obj_invoker0<>::invoke()
>   @      0xd16c83 impala::Thread::SuperviseThread()
>   @      0xd173c4 boost::detail::thread_data<>::run()
>   @     0x128fada (unknown)
>   @   0x7f4328a89ea5 start_thread
>   @   0x7f43287b28dd __clone
> E1229 17:06:27.922240 93138 impala-server.cc:1454] There was an error 
> processing the impalad catalog update. Requesting a full topic update to 
> recover: OutOfMemoryError: GC overhead limit exceeded
> When we try to do restarts, it was able to come up again. So, we are used to 
> restart catalog first and all Impala demons one by one to bring the Impala 
> cluster to stable state. Since, this particular Impala daemon has gone bad, 
> whole cluster service is coming down (evident from query completion time) 
> because some fragments of the queries is being processed through backend 
> connection by this daemon. Reason for restarting catalog first and all impala 
> daemons later because we suspect that for some reasons loading catalog 
> metadata into its Impala daemon’s own cache is creating memory pressure on 
> the impala daemon
> .
> On second occurrence, we tried running “invalidate metadata” command on some 
> other impala daemon and restarted bad impala daemon. This approach also 
> helped us.
> Net to net, observation and suspicious place is, loading catalog metadata 
> into Impala daemon’s own cache. During this 1 hour or so period, We don’t any 
> see abnormality in Cloudera Manager dashboard metrics especially on mem_rss, 
> tcmalloc metrics etc. Since there is no sign on health issues, other impala 
> daemons are forwarding fragments and Impala load balancer also forwarding 
> client requests to this problematic daemon, which is degrading the whole 
> service.
> Also, we had come across this -
> https://issues.apache.org/jira/browse/IMPALA-5459



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[jira] [Commented] (IMPALA-10420) Impala slow performance and crash

Reply via email to