Peter Vary commented on HIVE-17270:

??I'm not sure why the MiniSparkOnYarn cluster shows only 1 executor. My best 
guess is that the tests get are started as soon as 1 executor has started (see 
QTestUtil#createSessionState). Its possible the test just finishes before the 
second executor even gets created.??

The strange thing is, that not only the first query is showing only 1 executor 
in the test files, but all of them. But when I was running the tests on the 
cluster the first run shows 1 executor, the following ones 2 (until I change 
some spark configuration and the next one is only 1 executor again)

On the cluster I have unset {{spark.dynamicAllocation.enabled}}, to match the 
config of the {{MiniSparkOnYarn}} tests

The number of reducers are printed by this:
  private JSONObject outputMap(Map<?, ?> mp, boolean hasHeader, PrintStream out,
      boolean extended, boolean jsonOutput, int indent) throws Exception {
            boolean isFirst = true;
            for (SparkWork.Dependency dep: (List<SparkWork.Dependency>) 
ent.getValue()) {
              if (!isFirst) {
                out.print(", ");
              } else {
                out.print("<- ");
                isFirst = false;
              out.print(" (");
              out.print(", ");
    return jsonOutput ? json : null;

The GenSparkUtils.getEdgeProperty sets this:
  public static SparkEdgeProperty getEdgeProperty(ReduceSinkOperator reduceSink,
      ReduceWork reduceWork) throws SemanticException {
    SparkEdgeProperty edgeProperty = new 
    return edgeProperty;

Which is set by SetSparkReducerParallelism:
  public Object process(Node nd, Stack<Node> stack,
      NodeProcessorCtx procContext, Object... nodeOutputs)
      throws SemanticException {
        LOG.info("Set parallelism for reduce sink " + sink + " to: " + 
numReducers +
            " (calculated)");        <-- I see this in the logs which are 
matching the values in the explain plans
    return false;

This is the depth where I had to go home today :)
If no new pointers, then I will dig deeper tomorrow :)


> Qtest results show wrong number of executors
> --------------------------------------------
>                 Key: HIVE-17270
>                 URL: https://issues.apache.org/jira/browse/HIVE-17270
>             Project: Hive
>          Issue Type: Bug
>          Components: Spark
>    Affects Versions: 3.0.0
>            Reporter: Peter Vary
>            Assignee: Peter Vary
> The hive-site.xml shows, that the TestMiniSparkOnYarnCliDriver uses 2 cores, 
> and 2 executor instances to run the queries. See: 
> https://github.com/apache/hive/blob/master/data/conf/spark/yarn-client/hive-site.xml#L233
> When reading the log files for the query tests, I see the following:
> {code}
> 2017-08-08T07:41:03,315  INFO [0381325d-2c8c-46fb-ab51-423defaddd84 main] 
> session.SparkSession: Spark cluster current has executors: 1, total cores: 2, 
> memory per executor: 512M, memoryFraction: 0.4
> {code}
> See: 
> When running the tests against a real cluster, I found that running an 
> explain query for the first time I see 1 executor, but running it for the 
> second time I see 2 executors.
> Also setting some spark configuration on the cluster resets this behavior. 
> For the first time I will see 1 executor, and for the second time I will see 
> 2 executors again.

This message was sent by Atlassian JIRA

Reply via email to