[ https://issues.apache.org/jira/browse/SPARK-12179?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15047965#comment-15047965 ]
Tao Li edited comment on SPARK-12179 at 12/9/15 6:56 AM: --------------------------------------------------------- The row_number implementation is as follows: package UDF; import java.io.PrintStream; import org.apache.hadoop.hive.ql.exec.UDF; public class row_number extends UDF { private static int MAX_VALUE = 50; private static String[] comparedColumn = new String[MAX_VALUE]; private static int rowNum = 1; public int evaluate(Object[] args) { String[] columnValue = new String[args.length]; for (int i = 0; i < args.length; i++) { columnValue[i] = (args[i] == null ? "" : args[i].toString()); } if (rowNum == 1) { for (int i = 0; i < columnValue.length; i++) { comparedColumn[i] = columnValue[i]; } } for (int i = 0; i < columnValue.length; i++) { if (!comparedColumn[i].equals(columnValue[i])) { for (int j = 0; j < columnValue.length; j++) { comparedColumn[j] = columnValue[j]; } rowNum = 1; return rowNum++; } } return rowNum++; } } was (Author: litao1990): The row_number implementation is as follows: package UDF; import org.apache.hadoop.hive.ql.exec.UDF; public class RowNumber extends UDF { private static int MAX_VALUE = 50; private static String[] comparedColumn = new String[MAX_VALUE]; private static int rowNum = 1; public int evaluate(Object[] args) { String[] columnValue = new String[args.length]; for (int i = 0; i < args.length; i++) { columnValue[i] = (args[i] == null ? "" : args[i].toString()); } if (rowNum == 1) { for (int i = 0; i < columnValue.length; i++) { comparedColumn[i] = columnValue[i]; } } for (int i = 0; i < columnValue.length; i++) { if (!comparedColumn[i].equals(columnValue[i])) { for (int j = 0; j < columnValue.length; j++) { comparedColumn[j] = columnValue[j]; } rowNum = 1; return rowNum++; } } return rowNum++; } } > Spark SQL get different result with the same code > ------------------------------------------------- > > Key: SPARK-12179 > URL: https://issues.apache.org/jira/browse/SPARK-12179 > Project: Spark > Issue Type: Bug > Components: Spark Core, SQL > Affects Versions: 1.3.0, 1.3.1, 1.3.2, 1.4.0, 1.4.1, 1.4.2, 1.5.0, 1.5.1, > 1.5.2, 1.5.3 > Environment: hadoop version: 2.5.0-cdh5.3.2 > spark version: 1.5.3 > run mode: yarn-client > Reporter: Tao Li > Priority: Critical > > I run the sql in yarn-client mode, but get different result each time. > As you can see the example, I get the different shuffle write with the same > shuffle read in two jobs with the same code. > Some of my spark app runs well, but some always met this problem. And I met > this problem on spark 1.3, 1.4 and 1.5 version. > Can you give me some suggestions about the possible causes or how do I figure > out the problem? > 1. First Run > Details for Stage 9 (Attempt 0) > Total Time Across All Tasks: 5.8 min > Shuffle Read: 24.4 MB / 205399 > Shuffle Write: 6.8 MB / 54934 > 2. Second Run > Details for Stage 9 (Attempt 0) > Total Time Across All Tasks: 5.6 min > Shuffle Read: 24.4 MB / 205399 > Shuffle Write: 6.8 MB / 54905 -- This message was sent by Atlassian JIRA (v6.3.4#6332) --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org