Impalad JVM OOM minutes after restart

Brock Noland Tue, 21 Aug 2018 11:14:26 -0700

Hi folks,

I've got an Impala CDH 5.14.2 cluster with a handful of users, 2-3, at
any one time. All of a sudden the JVM inside the Impalad started
running out of memory.


I got a heap dump, but the heap was 32GB, host is 240GB, so it's very
large. Thus I wasn't able to get Memory Analyzer Tool (MAT) to open
it. I was able to get JHAT to opening it when setting JHAT's heap to
160GB. It's pretty unwieldy so much of the JHAT functionality doesn't
work.

I am spelunking around, but really curious if there is some places I
should check....

I am only an occasional reader of Impala source so I am just pointing
out things which felt interesting:

* Impalad was restarted shortly before the JVM OOM
* Joining Parquet on S3 with Kudu
* Only 13  instances of org.apache.impala.catalog.HdfsTable
* 176836 instances of org.apache.impala.analysis.Analyzer - this feels
odd to me. I remember one bug a while back in Hive when it would clone
the query tree until it ran OOM.
* 176796 of those _user fields point at the same user
* org.apache.impala.thrift.TQueryCt@0x7f90975297f8 has 11048
org.apache.impala.analysis.Analyzer@GlobalState objects pointing at
it.
*  There is only a single instance of
org.apache.impala.thrift.TQueryCtx alive in the JVM which appears to
indicate there is only a single query running. I've tracked that query
down in CM. The users need to compute stats, but I don't feel that is
relevant to this JVM OOM condition.

Any pointers on what I might look for?

Cheers,
Brock

Impalad JVM OOM minutes after restart

Reply via email to