Csaba Ringhofer created IMPALA-13424:
----------------------------------------
Summary: Improve diagnostics for OOM in JVM
Key: IMPALA-13424
URL: https://issues.apache.org/jira/browse/IMPALA-13424
Project: IMPALA
Issue Type: Improvement
Components: Catalog, Frontend
Reporter: Csaba Ringhofer
By default impalad/catalogd can survive a failed allocation in Java and throws
an exception that may abort the given thread instead of crashing the whole
process. There is already a ticket (IMPALA-1956) about changing the default
behavior.
Tested locally by allocating till OOM during HdfsTable.load(), which lead to
throwing OutOfMemoryError in catalogd. This lead to failing to load the table
and catalogd + coordinator could function properly after this. Error reporting
for client / in coordinator log seemed ok, but catalogd has no log lines or
metrics that clearly identify the issue:
client:
{code}
AnalysisException: Failed to load metadata for table: 'functional.alltypestiny'
CAUSED BY: TableLoadingException: Failed to load metadata for table:
functional.alltypestiny. Running 'invalidate metadata functional.alltypestiny'
may resolve this problem.
CAUSED BY: OutOfMemoryError: Java heap space
{code}
coordinator log:
{code}
I1007 16:48:41.760563 126482 jni-util.cc:321]
604b67244b0793c6:32c416c000000000] org.apache.impala.common.AnalysisException:
Failed to load metadata for table: 'functional.alltypestiny'
at
org.apache.impala.analysis.Analyzer.resolveTableRef(Analyzer.java:990)
at org.apache.impala.analysis.FromClause.analyze(FromClause.java:87)
at
org.apache.impala.analysis.SelectStmt$SelectAnalyzer.analyze(SelectStmt.java:330)
at
org.apache.impala.analysis.SelectStmt$SelectAnalyzer.access$100(SelectStmt.java:282)
at org.apache.impala.analysis.SelectStmt.analyze(SelectStmt.java:274)
at
org.apache.impala.analysis.AnalysisContext.analyze(AnalysisContext.java:558)
at
org.apache.impala.analysis.AnalysisContext.analyzeAndAuthorize(AnalysisContext.java:505)
at
org.apache.impala.service.Frontend.doCreateExecRequest(Frontend.java:2586)
at
org.apache.impala.service.Frontend.getTExecRequest(Frontend.java:2268)
at
org.apache.impala.service.Frontend.createExecRequest(Frontend.java:2029)
at
org.apache.impala.service.JniFrontend.createExecRequest(JniFrontend.java:175)
Caused by: org.apache.impala.catalog.TableLoadingException: Failed to load
metadata for table: functional.alltypestiny. Running 'invalidate metadata
functional.alltypestiny' may resolve this problem.
CAUSED BY: OutOfMemoryError: Java heap space
at
org.apache.impala.catalog.IncompleteTable.loadFromThrift(IncompleteTable.java:181)
at org.apache.impala.catalog.Table.fromThrift(Table.java:591)
at
org.apache.impala.catalog.ImpaladCatalog.addTable(ImpaladCatalog.java:489)
at
org.apache.impala.catalog.ImpaladCatalog.addCatalogObject(ImpaladCatalog.java:344)
at
org.apache.impala.catalog.ImpaladCatalog.updateCatalog(ImpaladCatalog.java:262)
at
org.apache.impala.service.FeCatalogManager$CatalogdImpl.updateCatalogCache(FeCatalogManager.java:114)
at
org.apache.impala.service.Frontend.updateCatalogCache(Frontend.java:590)
at
org.apache.impala.service.JniFrontend.updateCatalogCache(JniFrontend.java:196)
at ========.<Remote stack trace on catalogd>:
org.apache.impala.catalog.TableLoadingException: Failed to load metadata for
table: functional.alltypestiny. Running 'invalidate metadata
functional.alltypestiny' may resolve this problem.
at org.apache.impala.catalog.TableLoader.load(TableLoader.java:168)
at
org.apache.impala.catalog.TableLoadingMgr$2.call(TableLoadingMgr.java:251)
at
org.apache.impala.catalog.TableLoadingMgr$2.call(TableLoadingMgr.java:247)
at java.util.concurrent.FutureTask.run(FutureTask.java:266)
at
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
at
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
at java.lang.Thread.run(Thread.java:750)
Caused by: java.lang.OutOfMemoryError: Java heap space
()
{code}
catalogd log:
{code}
I1007 16:48:33.565677 126490 TableLoader.java:79] Loading metadata for:
functional.alltypestiny (background load)
...
I1007 16:48:38.755443 126490 TableLoader.java:177] Loaded metadata for:
functional.alltypestiny (5189ms)
I1007 16:48:38.755504 114205 JvmPauseMonitor.java:209] Detected pause in JVM or
host machine (eg GC): pause of approximately 1357ms
GC pool 'PS MarkSweep' had collection(s): count=2 time=1702ms
{code}
The oom doesn't lead to any warning/error level logs. Ideally there should be
an error level log with the name of the table and the callstack of the failed
allocation.
--
This message was sent by Atlassian Jira
(v8.20.10#820010)