Kent Yao created SPARK-25174:
--------------------------------

             Summary: ApplicationMaster suspends when unregistering itself from 
RM with extreme large diagnostic message
                 Key: SPARK-25174
                 URL: https://issues.apache.org/jira/browse/SPARK-25174
             Project: Spark
          Issue Type: Bug
          Components: YARN
    Affects Versions: 2.1.1
            Reporter: Kent Yao


We recently ran into SPARK-18016 which has been fixed in v2.3.0. This JIRA is 
not about the issue in SPARK-18016 but the side-effect which it brings. When 
SPARK-18016 occurs, ApplicationMaster fails unregistering itself because the 
exception contains extreme large error information.

{code:java}
ERROR yarn.ApplicationMaster: User class threw exception: 
java.lang.RuntimeException: Error while decoding: 
java.util.concurrent.ExecutionException: java.lang.Exception: failed to 
compile: org.codehaus.janino.JaninoRuntimeException: Constant pool has grown 
past JVM limit of 0xFFFF
/* 001 */ public java.lang.Object generate(Object[] references) {
....

/* 395656 */       mutableRow.update(0, value);
/* 395657 */     }
/* 395658 */
/* 395659 */     return mutableRow;
/* 395660 */   }
/* 395661 */ }
{code}

The above codegen text is included in the final message for AM to wave goodbye 
to RM, while it ends up crashing the rm's ZKRMStateStore for YARN-6125 not 
covering the unregisterApplicationMaster's message truncation. We also create 
an Jira on YARN Side https://issues.apache.org/jira/browse/YARN-8691 





--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to