[ 
https://issues.apache.org/jira/browse/SPARK-12179?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15047965#comment-15047965
 ] 

Tao Li edited comment on SPARK-12179 at 12/9/15 6:56 AM:
---------------------------------------------------------

The row_number implementation is as follows:

package UDF;

import java.io.PrintStream;
import org.apache.hadoop.hive.ql.exec.UDF;

public class row_number extends UDF
{
  private static int MAX_VALUE = 50;
  private static String[] comparedColumn = new String[MAX_VALUE];
  private static int rowNum = 1;

  public int evaluate(Object[] args) {
    String[] columnValue = new String[args.length];
    for (int i = 0; i < args.length; i++) {
      columnValue[i] = (args[i] == null ? "" : args[i].toString());
    }
    if (rowNum == 1) {
      for (int i = 0; i < columnValue.length; i++) {
        comparedColumn[i] = columnValue[i];
      }
    }
    for (int i = 0; i < columnValue.length; i++) {
      if (!comparedColumn[i].equals(columnValue[i])) {
        for (int j = 0; j < columnValue.length; j++) {
          comparedColumn[j] = columnValue[j];
        }
        rowNum = 1;

        return rowNum++;
      }
    }
    return rowNum++;
  }
}


was (Author: litao1990):
The row_number implementation is as follows:

package UDF;

import org.apache.hadoop.hive.ql.exec.UDF;

public class RowNumber extends UDF
{
  private static int MAX_VALUE = 50;
  private static String[] comparedColumn = new String[MAX_VALUE];
  private static int rowNum = 1;

  public int evaluate(Object[] args) {
    String[] columnValue = new String[args.length];
    for (int i = 0; i < args.length; i++)
    {
      columnValue[i] = (args[i] == null ? "" : args[i].toString());
    }
    if (rowNum == 1) {
      for (int i = 0; i < columnValue.length; i++) {
        comparedColumn[i] = columnValue[i];
      }
    }
    for (int i = 0; i < columnValue.length; i++) {
      if (!comparedColumn[i].equals(columnValue[i])) {
        for (int j = 0; j < columnValue.length; j++) {
          comparedColumn[j] = columnValue[j];
        }
        rowNum = 1;
        return rowNum++;
      }
    }
    return rowNum++;
  }
}

> Spark SQL get different result with the same code
> -------------------------------------------------
>
>                 Key: SPARK-12179
>                 URL: https://issues.apache.org/jira/browse/SPARK-12179
>             Project: Spark
>          Issue Type: Bug
>          Components: Spark Core, SQL
>    Affects Versions: 1.3.0, 1.3.1, 1.3.2, 1.4.0, 1.4.1, 1.4.2, 1.5.0, 1.5.1, 
> 1.5.2, 1.5.3
>         Environment: hadoop version: 2.5.0-cdh5.3.2
> spark version: 1.5.3
> run mode: yarn-client
>            Reporter: Tao Li
>            Priority: Critical
>
> I run the sql in yarn-client mode, but get different result each time.
> As you can see the example, I get the different shuffle write with the same 
> shuffle read in two jobs with the same code.
> Some of my spark app runs well, but some always met this problem. And I met 
> this problem on spark 1.3, 1.4 and 1.5 version.
> Can you give me some suggestions about the possible causes or how do I figure 
> out the problem?
> 1. First Run
> Details for Stage 9 (Attempt 0)
> Total Time Across All Tasks: 5.8 min
> Shuffle Read: 24.4 MB / 205399
> Shuffle Write: 6.8 MB / 54934
> 2. Second Run
> Details for Stage 9 (Attempt 0)
> Total Time Across All Tasks: 5.6 min
> Shuffle Read: 24.4 MB / 205399
> Shuffle Write: 6.8 MB / 54905



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

Reply via email to