[
https://issues.apache.org/jira/browse/SPARK-13983?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15258139#comment-15258139
]
Cheng Lian edited comment on SPARK-13983 at 4/26/16 4:44 PM:
-------------------------------------------------------------
Here's my (incomplete) finding:
Configurations set using {{-hiveconf}} and {{-hivevar}} are set to the current
{{SessionState}} after [calling SessionManager.openSession
here|https://github.com/apache/spark/blob/branch-1.6/sql/hive-thriftserver/src/main/scala/org/apache/spark/sql/hive/thriftserver/SparkSQLSessionManager.scala#L68-L70].
In 1.5, these configurations are populated implicitly since {{SessionState}} is
thread-local.
In 1.6, we create a new {{HiveContext}} using {{HiveContext.newSession}} under
multi-session mode, which then [creates a new execution Hive
client|https://github.com/apache/spark/blob/branch-1.6/sql/hive/src/main/scala/org/apache/spark/sql/hive/HiveContext.scala#L119].
My theory is that, {{ClientWrapper.newSession}} ignores the current
{{SessionState}} and simply creates a new one, thus configurations set via CLI
flags are dropped.
I haven't completely verified the last point though.
was (Author: lian cheng):
Here's my (incomplete) finding:
Configurations set using {{--hiveconf}} and {{--hivevar}} are set to the
current {{SessionState}} after [calling SessionManager.openSession
here|https://github.com/apache/spark/blob/branch-1.6/sql/hive-thriftserver/src/main/scala/org/apache/spark/sql/hive/thriftserver/SparkSQLSessionManager.scala#L68-L70].
In 1.5, these configurations are populated implicitly since {{SessionState}} is
thread-local.
In 1.6, we create a new {{HiveContext}} using {{HiveContext.newSession}} under
multi-session mode, which then [creates a new execution Hive
client|https://github.com/apache/spark/blob/branch-1.6/sql/hive/src/main/scala/org/apache/spark/sql/hive/HiveContext.scala#L119].
My theory is that, {{ClientWrapper.newSession}} ignores the current
{{SessionState}} and simply creates a new one, thus configurations set via CLI
flags are dropped.
I haven't completely verified the last point though.
> HiveThriftServer2 can not get "--hiveconf" or ''--hivevar" variables since
> 1.6 version (both multi-session and single session)
> ------------------------------------------------------------------------------------------------------------------------------
>
> Key: SPARK-13983
> URL: https://issues.apache.org/jira/browse/SPARK-13983
> Project: Spark
> Issue Type: Bug
> Components: SQL
> Affects Versions: 1.6.0, 1.6.1
> Environment: ubuntu, spark 1.6.0 standalone, spark 1.6.1 standalone
> (tried spark branch-1.6 snapshot as well)
> compiled with scala 2.10.5 and hadoop 2.6
> (-Phadoop-2.6 -Psparkr -Phive -Phive-thriftserver)
> Reporter: Teng Qiu
> Assignee: Cheng Lian
>
> HiveThriftServer2 should be able to get "\--hiveconf" or ''\-\-hivevar"
> variables from JDBC client, either from command line parameter of beeline,
> such as
> {{beeline --hiveconf spark.sql.shuffle.partitions=3 --hivevar
> db_name=default}}
> or from JDBC connection string, like
> {{jdbc:hive2://localhost:10000?spark.sql.shuffle.partitions=3#db_name=default}}
> this worked in spark version 1.5.x, but after upgraded to 1.6, it doesn't
> work.
> to reproduce this issue, try to connect to HiveThriftServer2 with beeline:
> {code}
> bin/beeline -u jdbc:hive2://localhost:10000 \
> --hiveconf spark.sql.shuffle.partitions=3 \
> --hivevar db_name=default
> {code}
> or
> {code}
> bin/beeline -u
> jdbc:hive2://localhost:10000?spark.sql.shuffle.partitions=3#db_name=default
> {code}
> will get following results:
> {code}
> 0: jdbc:hive2://localhost:10000> set spark.sql.shuffle.partitions;
> +-------------------------------+--------+--+
> | key | value |
> +-------------------------------+--------+--+
> | spark.sql.shuffle.partitions | 200 |
> +-------------------------------+--------+--+
> 1 row selected (0.192 seconds)
> 0: jdbc:hive2://localhost:10000> use ${db_name};
> Error: org.apache.spark.sql.AnalysisException: cannot recognize input near
> '$' '{' 'db_name' in switch database statement; line 1 pos 4 (state=,code=0)
> {code}
> -
> but this bug does not affect current versions of spark-sql CLI, following
> commands works:
> {code}
> bin/spark-sql --master local[2] \
> --hiveconf spark.sql.shuffle.partitions=3 \
> --hivevar db_name=default
> spark-sql> set spark.sql.shuffle.partitions
> spark.sql.shuffle.partitions 3
> Time taken: 1.037 seconds, Fetched 1 row(s)
> spark-sql> use ${db_name};
> OK
> Time taken: 1.697 seconds
> {code}
> so I think it may caused by this change:
> https://github.com/apache/spark/pull/8909 ( [SPARK-10810] [SPARK-10902] [SQL]
> Improve session management in SQL )
> perhaps by calling {{hiveContext.newSession}}, the variables from
> {{sessionConf}} were not loaded into the new session?
> (https://github.com/apache/spark/pull/8909/files#diff-8f8b7f4172e8a07ff20a4dbbbcc57b1dR69)
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]