[ 
https://issues.apache.org/jira/browse/TEZ-2278?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14482003#comment-14482003
 ] 

Sumeet Singh commented on TEZ-2278:
-----------------------------------

Ran Pig and Hive on Tez on Lahman Baseball Database. Attaching Pig script, Hive 
has the same problem. Observe the time discrepancy in the vertex/task and 
swinlane views in the attached screenshots. It appears the swimlane view is 
correct based on the logs, but not the task and vertex.

Pig Script to reproduce the issue. Download the data from seanlahman.com.
-- Load data with batting scores and player name
Batting = LOAD '/user/sumeetsi/TezTalk/lahman-csv_2015-01-24/Batting.csv' USING 
PigStorage(',');
PlayerName = LOAD '/user/sumeetsi/TezTalk/lahman-csv_2015-01-24/Master.csv' 
USING PigStorage(',');

-- Generate player first and last name with playerID
player_name = FOREACH PlayerName GENERATE $0 AS ID, $13 AS first, $14 AS last;

-- Calulate raw runs for each player by year
raw_runs = FILTER Batting BY $7>0;
runs = FOREACH raw_runs GENERATE $0 AS playerID, $1 AS year, $7 AS runs;
grp_data = GROUP runs BY (year);

-- Calculate max runs for each year
max_runs = FOREACH grp_data GENERATE group as grp, MAX(runs.runs) AS max_runs;

-- Join the result with playerID
join_max_runs = JOIN max_runs BY ($0, max_runs), runs BY (year, runs);

-- Join playerID with player_name
join_player_name = JOIN join_max_runs BY $2, player_name BY $0;

-- Store output as year, runs, and first and last name
interim_output = FOREACH join_player_name GENERATE $0 AS year, $1 AS runs, $6 
AS first, $7 AS last;
sorted_output = ORDER interim_output BY $0 desc;
store sorted_output into '/user/sumeetsi/TezTalk/Lahman_Pig_Example.out';

> Tez UI start/end time and duration shown are wrong for tasks
> ------------------------------------------------------------
>
>                 Key: TEZ-2278
>                 URL: https://issues.apache.org/jira/browse/TEZ-2278
>             Project: Apache Tez
>          Issue Type: Bug
>          Components: UI
>    Affects Versions: 0.6.0
>            Reporter: Rohini Palaniswamy
>
>  Observing lot of time discrepancies between vertex, task and swinlane views. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to