[
https://issues.apache.org/jira/browse/TEZ-3347?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Sreenath Somarajapuram updated TEZ-3347:
----------------------------------------
Summary: Tez UI: Vertex UI throws an error while getting vertexProgress for
a killed Vertex (was: Vertex UI throws an error while getting vertexProgress
for a killed Vertex)
> Tez UI: Vertex UI throws an error while getting vertexProgress for a killed
> Vertex
> ----------------------------------------------------------------------------------
>
> Key: TEZ-3347
> URL: https://issues.apache.org/jira/browse/TEZ-3347
> Project: Apache Tez
> Issue Type: Bug
> Components: UI
> Reporter: Kuhu Shukla
> Assignee: Kuhu Shukla
> Attachments: ErrorCodeFailedVertex.png, TEZ-3347.001.patch,
> TEZ-3347.002.patch
>
>
> Given an AM that fails all its attempts, the application fails and the very
> first click on the killed/failed vertex throws the following error:
> {code}
> error code: Unknown, message: expected expression, got '<'
> {code}
> It self corrects if tried again immediately after the failure.
> This is because the RM proxy redirects the call to the AHS server and the
> REST call is malformed for that server. Upon inspection of the responses, it
> was seen that the URL looked something like this:
> {code}
> http://<hostname>:<ahsport>/applicationhistory/app/application_123_456/ws/v1/tez/vertexProgress?dagID=1&vertexID=01&_=123
> {code}
> which is not a proper Rest call on the AHS.
> I think the following code can cause this issue:
> {code}
> // Load progress in parallel for v1 version of the api
> _loadProgress: function (vertices) {
> var that = this,
> runningVerticesIdx = vertices
> .filterBy('status', 'RUNNING')
> .map(function(item) {
> return item.get('id').split('_').splice(-1).pop();
> });
> if (runningVerticesIdx.length > 0) {
> this.store.unloadAll('vertexProgress');
> this.store.findQuery('vertexProgress', {
> metadata: {
> appId: that.get('applicationId'),
> dagIdx: that.get('idx'),
> vertexIds: runningVerticesIdx.join(',')
> }
> }).then(function(vertexProgressInfo) {
> App.Helpers.emData.mergeRecords(
> that.get('rowsDisplayed'),
> vertexProgressInfo,
> ['progress']
> );
> }).catch(function(error) {
> error.message = "Failed to fetch vertexProgress. Application Master
> (AM) is out of reach. Either it's down, or CORS is not enabled for YARN
> ResourceManager.";
> Em.Logger.error(error);
> var err = App.Helpers.misc.formatError(error);
> var msg = 'Error code: %@, message: %@'.fmt(err.errCode, err.msg);
> App.Helpers.ErrorBar.getInstance().show(msg, err.details);
> });
> {code}
> which uses AMInfo that gets the response based on what loadApp method finds:
> {code}
> loadApp: function (store, appId, useCache) {
> if(!useCache) {
> App.Helpers.misc.removeRecord(store, 'appDetail', appId);
> App.Helpers.misc.removeRecord(store, 'clusterApp', appId);
> }
> return store.find('clusterApp', appId).catch(function () {
> return store.find('appDetail', appId);
> }).catch(function (error) {
> error.message = "Couldn't get details of application %@. RM is not
> reachable, and history service is not enabled.".fmt(appId);
> throw error;
> });
> }
> {code}
> We can check here in the catch block if the response type is not JSON or not
> try and get vertexProgress since it knows that the application/AM has failed.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)