Csaba Ringhofer created IMPALA-13424:
----------------------------------------

             Summary: Improve diagnostics for OOM in JVM
                 Key: IMPALA-13424
                 URL: https://issues.apache.org/jira/browse/IMPALA-13424
             Project: IMPALA
          Issue Type: Improvement
          Components: Catalog, Frontend
            Reporter: Csaba Ringhofer


By default impalad/catalogd can survive a failed allocation in Java and throws 
an exception that may abort the given thread instead of crashing the whole 
process. There is already a ticket (IMPALA-1956) about changing the default 
behavior.

Tested locally by allocating till OOM during HdfsTable.load(), which lead to 
throwing OutOfMemoryError in catalogd. This lead to failing to load the table 
and catalogd + coordinator could function properly after this. Error reporting 
for client / in coordinator log seemed ok, but catalogd has no log lines or 
metrics that clearly identify the issue:
client:
{code}
AnalysisException: Failed to load metadata for table: 'functional.alltypestiny'
CAUSED BY: TableLoadingException: Failed to load metadata for table: 
functional.alltypestiny. Running 'invalidate metadata functional.alltypestiny' 
may resolve this problem.
CAUSED BY: OutOfMemoryError: Java heap space
{code}

coordinator log:
{code}
I1007 16:48:41.760563 126482 jni-util.cc:321] 
604b67244b0793c6:32c416c000000000] org.apache.impala.common.AnalysisException: 
Failed to load metadata for table: 'functional.alltypestiny'
        at 
org.apache.impala.analysis.Analyzer.resolveTableRef(Analyzer.java:990)
        at org.apache.impala.analysis.FromClause.analyze(FromClause.java:87)
        at 
org.apache.impala.analysis.SelectStmt$SelectAnalyzer.analyze(SelectStmt.java:330)
        at 
org.apache.impala.analysis.SelectStmt$SelectAnalyzer.access$100(SelectStmt.java:282)
        at org.apache.impala.analysis.SelectStmt.analyze(SelectStmt.java:274)
        at 
org.apache.impala.analysis.AnalysisContext.analyze(AnalysisContext.java:558)
        at 
org.apache.impala.analysis.AnalysisContext.analyzeAndAuthorize(AnalysisContext.java:505)
        at 
org.apache.impala.service.Frontend.doCreateExecRequest(Frontend.java:2586)
        at 
org.apache.impala.service.Frontend.getTExecRequest(Frontend.java:2268)
        at 
org.apache.impala.service.Frontend.createExecRequest(Frontend.java:2029)
        at 
org.apache.impala.service.JniFrontend.createExecRequest(JniFrontend.java:175)
Caused by: org.apache.impala.catalog.TableLoadingException: Failed to load 
metadata for table: functional.alltypestiny. Running 'invalidate metadata 
functional.alltypestiny' may resolve this problem.
CAUSED BY: OutOfMemoryError: Java heap space
        at 
org.apache.impala.catalog.IncompleteTable.loadFromThrift(IncompleteTable.java:181)
        at org.apache.impala.catalog.Table.fromThrift(Table.java:591)
        at 
org.apache.impala.catalog.ImpaladCatalog.addTable(ImpaladCatalog.java:489)
        at 
org.apache.impala.catalog.ImpaladCatalog.addCatalogObject(ImpaladCatalog.java:344)
        at 
org.apache.impala.catalog.ImpaladCatalog.updateCatalog(ImpaladCatalog.java:262)
        at 
org.apache.impala.service.FeCatalogManager$CatalogdImpl.updateCatalogCache(FeCatalogManager.java:114)
        at 
org.apache.impala.service.Frontend.updateCatalogCache(Frontend.java:590)
        at 
org.apache.impala.service.JniFrontend.updateCatalogCache(JniFrontend.java:196)
        at ========.<Remote stack trace on catalogd>: 
org.apache.impala.catalog.TableLoadingException: Failed to load metadata for 
table: functional.alltypestiny. Running 'invalidate metadata 
functional.alltypestiny' may resolve this problem.
        at org.apache.impala.catalog.TableLoader.load(TableLoader.java:168)
        at 
org.apache.impala.catalog.TableLoadingMgr$2.call(TableLoadingMgr.java:251)
        at 
org.apache.impala.catalog.TableLoadingMgr$2.call(TableLoadingMgr.java:247)
        at java.util.concurrent.FutureTask.run(FutureTask.java:266)
        at 
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
        at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
        at java.lang.Thread.run(Thread.java:750)
Caused by: java.lang.OutOfMemoryError: Java heap space
()
{code}

catalogd log:
{code}
I1007 16:48:33.565677 126490 TableLoader.java:79] Loading metadata for: 
functional.alltypestiny (background load)
...
I1007 16:48:38.755443 126490 TableLoader.java:177] Loaded metadata for: 
functional.alltypestiny (5189ms)
I1007 16:48:38.755504 114205 JvmPauseMonitor.java:209] Detected pause in JVM or 
host machine (eg GC): pause of approximately 1357ms
GC pool 'PS MarkSweep' had collection(s): count=2 time=1702ms
{code}

The oom doesn't lead to any warning/error level logs. Ideally there should be 
an error level log with the name of the table and the callstack of the failed 
allocation.





--
This message was sent by Atlassian Jira
(v8.20.10#820010)

Reply via email to