[
https://issues.apache.org/jira/browse/SPARK-15345?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15299568#comment-15299568
]
Yi Zhou edited comment on SPARK-15345 at 5/25/16 6:56 AM:
----------------------------------------------------------
1) Spark SQL can't find existing hive metastore database in spark-sql shell by
issuing 'show databases;'
2) Always told me that there is already existing database..(i saw a local derby
metastore_db folder in current directory). it seemed that spark sql can't read
the hive conf(eg, hive-site.xml )..
3) Key configurations in spark-defaults.conf:
{code}
spark.sql.hive.metastore.version=1.1.0
spark.sql.hive.metastore.jars=/usr/lib/hive/lib/*:/usr/lib/hadoop/client/*
spark.executor.extraClassPath=/etc/hive/conf
spark.driver.extraClassPath=/etc/hive/conf
spark.yarn.jars=local:/usr/lib/spark/jars/*
{code}
16/05/23 09:48:24 ERROR metastore.RetryingHMSHandler:
AlreadyExistsException(message:Database test_sparksql already exists)
at
org.apache.hadoop.hive.metastore.HiveMetaStore$HMSHandler.create_database(HiveMetaStore.java:898)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
at
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:606)
at
org.apache.hadoop.hive.metastore.RetryingHMSHandler.invokeInternal(RetryingHMSHandler.java:133)
at
org.apache.hadoop.hive.metastore.RetryingHMSHandler.invoke(RetryingHMSHandler.java:99)
at com.sun.proxy.$Proxy34.create_database(Unknown Source)
at
org.apache.hadoop.hive.metastore.HiveMetaStoreClient.createDatabase(HiveMetaStoreClient.java:645)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
at
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:606)
at
org.apache.hadoop.hive.metastore.RetryingMetaStoreClient.invoke(RetryingMetaStoreClient.java:91)
at com.sun.proxy.$Proxy35.createDatabase(Unknown Source)
at org.apache.hadoop.hive.ql.metadata.Hive.createDatabase(Hive.java:341)
at
org.apache.spark.sql.hive.client.HiveClientImpl$$anonfun$createDatabase$1.apply$mcV$sp(HiveClientImpl.scala:289)
at
org.apache.spark.sql.hive.client.HiveClientImpl$$anonfun$createDatabase$1.apply(HiveClientImpl.scala:289)
at
org.apache.spark.sql.hive.client.HiveClientImpl$$anonfun$createDatabase$1.apply(HiveClientImpl.scala:289)
at
org.apache.spark.sql.hive.client.HiveClientImpl$$anonfun$withHiveState$1.apply(HiveClientImpl.scala:260)
at
org.apache.spark.sql.hive.client.HiveClientImpl.liftedTree1$1(HiveClientImpl.scala:207)
at
org.apache.spark.sql.hive.client.HiveClientImpl.retryLocked(HiveClientImpl.scala:206)
at
org.apache.spark.sql.hive.client.HiveClientImpl.withHiveState(HiveClientImpl.scala:249)
at
org.apache.spark.sql.hive.client.HiveClientImpl.createDatabase(HiveClientImpl.scala:288)
at
org.apache.spark.sql.hive.HiveExternalCatalog$$anonfun$createDatabase$1.apply$mcV$sp(HiveExternalCatalog.scala:94)
at
org.apache.spark.sql.hive.HiveExternalCatalog$$anonfun$createDatabase$1.apply(HiveExternalCatalog.scala:94)
at
org.apache.spark.sql.hive.HiveExternalCatalog$$anonfun$createDatabase$1.apply(HiveExternalCatalog.scala:94)
at
org.apache.spark.sql.hive.HiveExternalCatalog.withClient(HiveExternalCatalog.scala:68)
at
org.apache.spark.sql.hive.HiveExternalCatalog.createDatabase(HiveExternalCatalog.scala:93)
at
org.apache.spark.sql.catalyst.catalog.SessionCatalog.createDatabase(SessionCatalog.scala:142)
at
org.apache.spark.sql.execution.command.CreateDatabaseCommand.run(ddl.scala:58)
at
org.apache.spark.sql.execution.command.ExecutedCommandExec.sideEffectResult$lzycompute(commands.scala:57)
at
org.apache.spark.sql.execution.command.ExecutedCommandExec.sideEffectResult(commands.scala:55)
at
org.apache.spark.sql.execution.command.ExecutedCommandExec.doExecute(commands.scala:69)
at
org.apache.spark.sql.execution.SparkPlan$$anonfun$execute$1.apply(SparkPlan.scala:115)
at
org.apache.spark.sql.execution.SparkPlan$$anonfun$execute$1.apply(SparkPlan.scala:115)
at
org.apache.spark.sql.execution.SparkPlan$$anonfun$executeQuery$1.apply(SparkPlan.scala:136)
at
org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:151)
at
org.apache.spark.sql.execution.SparkPlan.executeQuery(SparkPlan.scala:133)
at org.apache.spark.sql.execution.SparkPlan.execute(SparkPlan.scala:114)
at
org.apache.spark.sql.execution.QueryExecution.toRdd$lzycompute(QueryExecution.scala:85)
at
org.apache.spark.sql.execution.QueryExecution.toRdd(QueryExecution.scala:85)
at org.apache.spark.sql.Dataset.<init>(Dataset.scala:187)
at org.apache.spark.sql.Dataset.<init>(Dataset.scala:168)
at org.apache.spark.sql.Dataset$.ofRows(Dataset.scala:63)
at org.apache.spark.sql.SparkSession.sql(SparkSession.scala:529)
at org.apache.spark.sql.SQLContext.sql(SQLContext.scala:649)
at
org.apache.spark.sql.hive.thriftserver.SparkSQLDriver.run(SparkSQLDriver.scala:62)
at
org.apache.spark.sql.hive.thriftserver.SparkSQLCLIDriver.processCmd(SparkSQLCLIDriver.scala:325)
at org.apache.hadoop.hive.cli.CliDriver.processLine(CliDriver.java:376)
at org.apache.hadoop.hive.cli.CliDriver.processLine(CliDriver.java:311)
at
org.apache.hadoop.hive.cli.CliDriver.processReader(CliDriver.java:409)
at org.apache.hadoop.hive.cli.CliDriver.processFile(CliDriver.java:425)
at
org.apache.hadoop.hive.cli.CliDriver.processInitFiles(CliDriver.java:436)
at
org.apache.spark.sql.hive.thriftserver.SparkSQLCLIDriver$.main(SparkSQLCLIDriver.scala:158)
at
org.apache.spark.sql.hive.thriftserver.SparkSQLCLIDriver.main(SparkSQLCLIDriver.scala)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
at
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:606)
at
org.apache.spark.deploy.SparkSubmit$.org$apache$spark$deploy$SparkSubmit$$runMain(SparkSubmit.scala:724)
at
org.apache.spark.deploy.SparkSubmit$.doRunMain$1(SparkSubmit.scala:180)
at org.apache.spark.deploy.SparkSubmit$.submit(SparkSubmit.scala:205)
at org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala:119)
at org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala)
was (Author: jameszhouyi):
1) Spark SQL can't find existing hive metastore database in spark-sql shell by
issuing 'show databases;'
2) Always told me that there is already existing database..(i saw a local derby
metastore_db folder in current directory). it seemed that spark sql can't read
the hive conf..
3) Key configurations in spark-defaults.conf:
{code}
spark.sql.hive.metastore.version=1.1.0
spark.sql.hive.metastore.jars=/usr/lib/hive/lib/*:/usr/lib/hadoop/client/*
spark.executor.extraClassPath=/etc/hive/conf
spark.driver.extraClassPath=/etc/hive/conf
spark.yarn.jars=local:/usr/lib/spark/jars/*
{code}
16/05/23 09:48:24 ERROR metastore.RetryingHMSHandler:
AlreadyExistsException(message:Database test_sparksql already exists)
at
org.apache.hadoop.hive.metastore.HiveMetaStore$HMSHandler.create_database(HiveMetaStore.java:898)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
at
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:606)
at
org.apache.hadoop.hive.metastore.RetryingHMSHandler.invokeInternal(RetryingHMSHandler.java:133)
at
org.apache.hadoop.hive.metastore.RetryingHMSHandler.invoke(RetryingHMSHandler.java:99)
at com.sun.proxy.$Proxy34.create_database(Unknown Source)
at
org.apache.hadoop.hive.metastore.HiveMetaStoreClient.createDatabase(HiveMetaStoreClient.java:645)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
at
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:606)
at
org.apache.hadoop.hive.metastore.RetryingMetaStoreClient.invoke(RetryingMetaStoreClient.java:91)
at com.sun.proxy.$Proxy35.createDatabase(Unknown Source)
at org.apache.hadoop.hive.ql.metadata.Hive.createDatabase(Hive.java:341)
at
org.apache.spark.sql.hive.client.HiveClientImpl$$anonfun$createDatabase$1.apply$mcV$sp(HiveClientImpl.scala:289)
at
org.apache.spark.sql.hive.client.HiveClientImpl$$anonfun$createDatabase$1.apply(HiveClientImpl.scala:289)
at
org.apache.spark.sql.hive.client.HiveClientImpl$$anonfun$createDatabase$1.apply(HiveClientImpl.scala:289)
at
org.apache.spark.sql.hive.client.HiveClientImpl$$anonfun$withHiveState$1.apply(HiveClientImpl.scala:260)
at
org.apache.spark.sql.hive.client.HiveClientImpl.liftedTree1$1(HiveClientImpl.scala:207)
at
org.apache.spark.sql.hive.client.HiveClientImpl.retryLocked(HiveClientImpl.scala:206)
at
org.apache.spark.sql.hive.client.HiveClientImpl.withHiveState(HiveClientImpl.scala:249)
at
org.apache.spark.sql.hive.client.HiveClientImpl.createDatabase(HiveClientImpl.scala:288)
at
org.apache.spark.sql.hive.HiveExternalCatalog$$anonfun$createDatabase$1.apply$mcV$sp(HiveExternalCatalog.scala:94)
at
org.apache.spark.sql.hive.HiveExternalCatalog$$anonfun$createDatabase$1.apply(HiveExternalCatalog.scala:94)
at
org.apache.spark.sql.hive.HiveExternalCatalog$$anonfun$createDatabase$1.apply(HiveExternalCatalog.scala:94)
at
org.apache.spark.sql.hive.HiveExternalCatalog.withClient(HiveExternalCatalog.scala:68)
at
org.apache.spark.sql.hive.HiveExternalCatalog.createDatabase(HiveExternalCatalog.scala:93)
at
org.apache.spark.sql.catalyst.catalog.SessionCatalog.createDatabase(SessionCatalog.scala:142)
at
org.apache.spark.sql.execution.command.CreateDatabaseCommand.run(ddl.scala:58)
at
org.apache.spark.sql.execution.command.ExecutedCommandExec.sideEffectResult$lzycompute(commands.scala:57)
at
org.apache.spark.sql.execution.command.ExecutedCommandExec.sideEffectResult(commands.scala:55)
at
org.apache.spark.sql.execution.command.ExecutedCommandExec.doExecute(commands.scala:69)
at
org.apache.spark.sql.execution.SparkPlan$$anonfun$execute$1.apply(SparkPlan.scala:115)
at
org.apache.spark.sql.execution.SparkPlan$$anonfun$execute$1.apply(SparkPlan.scala:115)
at
org.apache.spark.sql.execution.SparkPlan$$anonfun$executeQuery$1.apply(SparkPlan.scala:136)
at
org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:151)
at
org.apache.spark.sql.execution.SparkPlan.executeQuery(SparkPlan.scala:133)
at org.apache.spark.sql.execution.SparkPlan.execute(SparkPlan.scala:114)
at
org.apache.spark.sql.execution.QueryExecution.toRdd$lzycompute(QueryExecution.scala:85)
at
org.apache.spark.sql.execution.QueryExecution.toRdd(QueryExecution.scala:85)
at org.apache.spark.sql.Dataset.<init>(Dataset.scala:187)
at org.apache.spark.sql.Dataset.<init>(Dataset.scala:168)
at org.apache.spark.sql.Dataset$.ofRows(Dataset.scala:63)
at org.apache.spark.sql.SparkSession.sql(SparkSession.scala:529)
at org.apache.spark.sql.SQLContext.sql(SQLContext.scala:649)
at
org.apache.spark.sql.hive.thriftserver.SparkSQLDriver.run(SparkSQLDriver.scala:62)
at
org.apache.spark.sql.hive.thriftserver.SparkSQLCLIDriver.processCmd(SparkSQLCLIDriver.scala:325)
at org.apache.hadoop.hive.cli.CliDriver.processLine(CliDriver.java:376)
at org.apache.hadoop.hive.cli.CliDriver.processLine(CliDriver.java:311)
at
org.apache.hadoop.hive.cli.CliDriver.processReader(CliDriver.java:409)
at org.apache.hadoop.hive.cli.CliDriver.processFile(CliDriver.java:425)
at
org.apache.hadoop.hive.cli.CliDriver.processInitFiles(CliDriver.java:436)
at
org.apache.spark.sql.hive.thriftserver.SparkSQLCLIDriver$.main(SparkSQLCLIDriver.scala:158)
at
org.apache.spark.sql.hive.thriftserver.SparkSQLCLIDriver.main(SparkSQLCLIDriver.scala)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
at
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:606)
at
org.apache.spark.deploy.SparkSubmit$.org$apache$spark$deploy$SparkSubmit$$runMain(SparkSubmit.scala:724)
at
org.apache.spark.deploy.SparkSubmit$.doRunMain$1(SparkSubmit.scala:180)
at org.apache.spark.deploy.SparkSubmit$.submit(SparkSubmit.scala:205)
at org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala:119)
at org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala)
> SparkSession's conf doesn't take effect when there's already an existing
> SparkContext
> -------------------------------------------------------------------------------------
>
> Key: SPARK-15345
> URL: https://issues.apache.org/jira/browse/SPARK-15345
> Project: Spark
> Issue Type: Bug
> Components: PySpark, SQL
> Reporter: Piotr Milanowski
> Assignee: Reynold Xin
> Priority: Blocker
> Fix For: 2.0.0
>
>
> I am working with branch-2.0, spark is compiled with hive support (-Phive and
> -Phvie-thriftserver).
> I am trying to access databases using this snippet:
> {code}
> from pyspark.sql import HiveContext
> hc = HiveContext(sc)
> hc.sql("show databases").collect()
> [Row(result='default')]
> {code}
> This means that spark doesn't find any databases specified in configuration.
> Using the same configuration (i.e. hive-site.xml and core-site.xml) in spark
> 1.6, and launching above snippet, I can print out existing databases.
> When run in DEBUG mode this is what spark (2.0) prints out:
> {code}
> 16/05/16 12:17:47 INFO SparkSqlParser: Parsing command: show databases
> 16/05/16 12:17:47 DEBUG SimpleAnalyzer:
> === Result of Batch Resolution ===
> !'Project [unresolveddeserializer(createexternalrow(if (isnull(input[0,
> string])) null else input[0, string].toString,
> StructField(result,StringType,false)), result#2) AS #3] Project
> [createexternalrow(if (isnull(result#2)) null else result#2.toString,
> StructField(result,StringType,false)) AS #3]
> +- LocalRelation [result#2]
>
> +- LocalRelation [result#2]
>
> 16/05/16 12:17:47 DEBUG ClosureCleaner: +++ Cleaning closure <function1>
> (org.apache.spark.sql.Dataset$$anonfun$53) +++
> 16/05/16 12:17:47 DEBUG ClosureCleaner: + declared fields: 2
> 16/05/16 12:17:47 DEBUG ClosureCleaner: public static final long
> org.apache.spark.sql.Dataset$$anonfun$53.serialVersionUID
> 16/05/16 12:17:47 DEBUG ClosureCleaner: private final
> org.apache.spark.sql.types.StructType
> org.apache.spark.sql.Dataset$$anonfun$53.structType$1
> 16/05/16 12:17:47 DEBUG ClosureCleaner: + declared methods: 2
> 16/05/16 12:17:47 DEBUG ClosureCleaner: public final java.lang.Object
> org.apache.spark.sql.Dataset$$anonfun$53.apply(java.lang.Object)
> 16/05/16 12:17:47 DEBUG ClosureCleaner: public final java.lang.Object
> org.apache.spark.sql.Dataset$$anonfun$53.apply(org.apache.spark.sql.catalyst.InternalRow)
> 16/05/16 12:17:47 DEBUG ClosureCleaner: + inner classes: 0
> 16/05/16 12:17:47 DEBUG ClosureCleaner: + outer classes: 0
> 16/05/16 12:17:47 DEBUG ClosureCleaner: + outer objects: 0
> 16/05/16 12:17:47 DEBUG ClosureCleaner: + populating accessed fields because
> this is the starting closure
> 16/05/16 12:17:47 DEBUG ClosureCleaner: + fields accessed by starting
> closure: 0
> 16/05/16 12:17:47 DEBUG ClosureCleaner: + there are no enclosing objects!
> 16/05/16 12:17:47 DEBUG ClosureCleaner: +++ closure <function1>
> (org.apache.spark.sql.Dataset$$anonfun$53) is now cleaned +++
> 16/05/16 12:17:47 DEBUG ClosureCleaner: +++ Cleaning closure <function1>
> (org.apache.spark.sql.execution.python.EvaluatePython$$anonfun$javaToPython$1)
> +++
> 16/05/16 12:17:47 DEBUG ClosureCleaner: + declared fields: 1
> 16/05/16 12:17:47 DEBUG ClosureCleaner: public static final long
> org.apache.spark.sql.execution.python.EvaluatePython$$anonfun$javaToPython$1.serialVersionUID
> 16/05/16 12:17:47 DEBUG ClosureCleaner: + declared methods: 2
> 16/05/16 12:17:47 DEBUG ClosureCleaner: public final java.lang.Object
> org.apache.spark.sql.execution.python.EvaluatePython$$anonfun$javaToPython$1.apply(java.lang.Object)
> 16/05/16 12:17:47 DEBUG ClosureCleaner: public final
> org.apache.spark.api.python.SerDeUtil$AutoBatchedPickler
> org.apache.spark.sql.execution.python.EvaluatePython$$anonfun$javaToPython$1.apply(scala.collection.Iterator)
> 16/05/16 12:17:47 DEBUG ClosureCleaner: + inner classes: 0
> 16/05/16 12:17:47 DEBUG ClosureCleaner: + outer classes: 0
> 16/05/16 12:17:47 DEBUG ClosureCleaner: + outer objects: 0
> 16/05/16 12:17:47 DEBUG ClosureCleaner: + populating accessed fields because
> this is the starting closure
> 16/05/16 12:17:47 DEBUG ClosureCleaner: + fields accessed by starting
> closure: 0
> 16/05/16 12:17:47 DEBUG ClosureCleaner: + there are no enclosing objects!
> 16/05/16 12:17:47 DEBUG ClosureCleaner: +++ closure <function1>
> (org.apache.spark.sql.execution.python.EvaluatePython$$anonfun$javaToPython$1)
> is now cleaned +++
> 16/05/16 12:17:47 DEBUG ClosureCleaner: +++ Cleaning closure <function1>
> (org.apache.spark.rdd.RDD$$anonfun$collect$1$$anonfun$13) +++
> 16/05/16 12:17:47 DEBUG ClosureCleaner: + declared fields: 2
> 16/05/16 12:17:47 DEBUG ClosureCleaner: public static final long
> org.apache.spark.rdd.RDD$$anonfun$collect$1$$anonfun$13.serialVersionUID
> 16/05/16 12:17:47 DEBUG ClosureCleaner: private final
> org.apache.spark.rdd.RDD$$anonfun$collect$1
> org.apache.spark.rdd.RDD$$anonfun$collect$1$$anonfun$13.$outer
> 16/05/16 12:17:47 DEBUG ClosureCleaner: + declared methods: 2
> 16/05/16 12:17:47 DEBUG ClosureCleaner: public final java.lang.Object
> org.apache.spark.rdd.RDD$$anonfun$collect$1$$anonfun$13.apply(java.lang.Object)
> 16/05/16 12:17:47 DEBUG ClosureCleaner: public final java.lang.Object
> org.apache.spark.rdd.RDD$$anonfun$collect$1$$anonfun$13.apply(scala.collection.Iterator)
> 16/05/16 12:17:47 DEBUG ClosureCleaner: + inner classes: 0
> 16/05/16 12:17:47 DEBUG ClosureCleaner: + outer classes: 2
> 16/05/16 12:17:47 DEBUG ClosureCleaner:
> org.apache.spark.rdd.RDD$$anonfun$collect$1
> 16/05/16 12:17:47 DEBUG ClosureCleaner: org.apache.spark.rdd.RDD
> 16/05/16 12:17:47 DEBUG ClosureCleaner: + outer objects: 2
> 16/05/16 12:17:47 DEBUG ClosureCleaner: <function0>
> 16/05/16 12:17:47 DEBUG ClosureCleaner: MapPartitionsRDD[5] at collect
> at <stdin>:1
> 16/05/16 12:17:47 DEBUG ClosureCleaner: + populating accessed fields because
> this is the starting closure
> 16/05/16 12:17:47 DEBUG ClosureCleaner: + fields accessed by starting
> closure: 2
> 16/05/16 12:17:47 DEBUG ClosureCleaner: (class
> org.apache.spark.rdd.RDD$$anonfun$collect$1,Set($outer))
> 16/05/16 12:17:47 DEBUG ClosureCleaner: (class
> org.apache.spark.rdd.RDD,Set(org$apache$spark$rdd$RDD$$evidence$1))
> 16/05/16 12:17:47 DEBUG ClosureCleaner: + outermost object is not a closure
> or REPL line object, so do not clone it: (class
> org.apache.spark.rdd.RDD,MapPartitionsRDD[5] at collect at <stdin>:1)
> 16/05/16 12:17:47 DEBUG ClosureCleaner: + cloning the object <function0> of
> class org.apache.spark.rdd.RDD$$anonfun$collect$1
> 16/05/16 12:17:47 DEBUG ClosureCleaner: + cleaning cloned closure
> <function0> recursively (org.apache.spark.rdd.RDD$$anonfun$collect$1)
> 16/05/16 12:17:47 DEBUG ClosureCleaner: +++ Cleaning closure <function0>
> (org.apache.spark.rdd.RDD$$anonfun$collect$1) +++
> 16/05/16 12:17:47 DEBUG ClosureCleaner: + declared fields: 2
> 16/05/16 12:17:47 DEBUG ClosureCleaner: public static final long
> org.apache.spark.rdd.RDD$$anonfun$collect$1.serialVersionUID
> 16/05/16 12:17:47 DEBUG ClosureCleaner: private final
> org.apache.spark.rdd.RDD org.apache.spark.rdd.RDD$$anonfun$collect$1.$outer
> 16/05/16 12:17:47 DEBUG ClosureCleaner: + declared methods: 2
> 16/05/16 12:17:47 DEBUG ClosureCleaner: public org.apache.spark.rdd.RDD
> org.apache.spark.rdd.RDD$$anonfun$collect$1.org$apache$spark$rdd$RDD$$anonfun$$$outer()
> 16/05/16 12:17:47 DEBUG ClosureCleaner: public final java.lang.Object
> org.apache.spark.rdd.RDD$$anonfun$collect$1.apply()
> 16/05/16 12:17:47 DEBUG ClosureCleaner: + inner classes: 1
> 16/05/16 12:17:47 DEBUG ClosureCleaner:
> org.apache.spark.rdd.RDD$$anonfun$collect$1$$anonfun$13
> 16/05/16 12:17:47 DEBUG ClosureCleaner: + outer classes: 1
> 16/05/16 12:17:47 DEBUG ClosureCleaner: org.apache.spark.rdd.RDD
> 16/05/16 12:17:47 DEBUG ClosureCleaner: + outer objects: 1
> 16/05/16 12:17:47 DEBUG ClosureCleaner: MapPartitionsRDD[5] at collect
> at <stdin>:1
> 16/05/16 12:17:47 DEBUG ClosureCleaner: + fields accessed by starting
> closure: 2
> 16/05/16 12:17:47 DEBUG ClosureCleaner: (class
> org.apache.spark.rdd.RDD$$anonfun$collect$1,Set($outer))
> 16/05/16 12:17:47 DEBUG ClosureCleaner: (class
> org.apache.spark.rdd.RDD,Set(org$apache$spark$rdd$RDD$$evidence$1))
> 16/05/16 12:17:47 DEBUG ClosureCleaner: + outermost object is not a closure
> or REPL line object, so do not clone it: (class
> org.apache.spark.rdd.RDD,MapPartitionsRDD[5] at collect at <stdin>:1)
> 16/05/16 12:17:47 DEBUG ClosureCleaner: +++ closure <function0>
> (org.apache.spark.rdd.RDD$$anonfun$collect$1) is now cleaned +++
> 16/05/16 12:17:47 DEBUG ClosureCleaner: +++ closure <function1>
> (org.apache.spark.rdd.RDD$$anonfun$collect$1$$anonfun$13) is now cleaned +++
> 16/05/16 12:17:47 DEBUG ClosureCleaner: +++ Cleaning closure <function2>
> (org.apache.spark.SparkContext$$anonfun$runJob$5) +++
> 16/05/16 12:17:47 DEBUG ClosureCleaner: + declared fields: 2
> 16/05/16 12:17:47 DEBUG ClosureCleaner: public static final long
> org.apache.spark.SparkContext$$anonfun$runJob$5.serialVersionUID
> 16/05/16 12:17:47 DEBUG ClosureCleaner: private final scala.Function1
> org.apache.spark.SparkContext$$anonfun$runJob$5.cleanedFunc$1
> 16/05/16 12:17:47 DEBUG ClosureCleaner: + declared methods: 2
> 16/05/16 12:17:47 DEBUG ClosureCleaner: public final java.lang.Object
> org.apache.spark.SparkContext$$anonfun$runJob$5.apply(java.lang.Object,java.lang.Object)
> 16/05/16 12:17:47 DEBUG ClosureCleaner: public final java.lang.Object
> org.apache.spark.SparkContext$$anonfun$runJob$5.apply(org.apache.spark.TaskContext,scala.collection.Iterator)
> 16/05/16 12:17:47 DEBUG ClosureCleaner: + inner classes: 0
> 16/05/16 12:17:47 DEBUG ClosureCleaner: + outer classes: 0
> 16/05/16 12:17:47 DEBUG ClosureCleaner: + outer objects: 0
> 16/05/16 12:17:47 DEBUG ClosureCleaner: + populating accessed fields because
> this is the starting closure
> 16/05/16 12:17:47 DEBUG ClosureCleaner: + fields accessed by starting
> closure: 0
> 16/05/16 12:17:47 DEBUG ClosureCleaner: + there are no enclosing objects!
> 16/05/16 12:17:47 DEBUG ClosureCleaner: +++ closure <function2>
> (org.apache.spark.SparkContext$$anonfun$runJob$5) is now cleaned +++
> 16/05/16 12:17:47 INFO SparkContext: Starting job: collect at <stdin>:1
> 16/05/16 12:17:47 INFO DAGScheduler: Got job 1 (collect at <stdin>:1) with 1
> output partitions
> 16/05/16 12:17:47 INFO DAGScheduler: Final stage: ResultStage 1 (collect at
> <stdin>:1)
> 16/05/16 12:17:47 INFO DAGScheduler: Parents of final stage: List()
> 16/05/16 12:17:47 INFO DAGScheduler: Missing parents: List()
> 16/05/16 12:17:47 DEBUG DAGScheduler: submitStage(ResultStage 1)
> 16/05/16 12:17:47 DEBUG DAGScheduler: missing: List()
> 16/05/16 12:17:47 INFO DAGScheduler: Submitting ResultStage 1
> (MapPartitionsRDD[5] at collect at <stdin>:1), which has no missing parents
> 16/05/16 12:17:47 DEBUG DAGScheduler: submitMissingTasks(ResultStage 1)
> 16/05/16 12:17:47 INFO MemoryStore: Block broadcast_1 stored as values in
> memory (estimated size 3.1 KB, free 5.8 GB)
> 16/05/16 12:17:47 DEBUG BlockManager: Put block broadcast_1 locally took 1 ms
> 16/05/16 12:17:47 DEBUG BlockManager: Putting block broadcast_1 without
> replication took 1 ms
> 16/05/16 12:17:47 INFO MemoryStore: Block broadcast_1_piece0 stored as bytes
> in memory (estimated size 1856.0 B, free 5.8 GB)
> 16/05/16 12:17:47 INFO BlockManagerInfo: Added broadcast_1_piece0 in memory
> on 188.165.13.157:35738 (size: 1856.0 B, free: 5.8 GB)
> 16/05/16 12:17:47 DEBUG BlockManagerMaster: Updated info of block
> broadcast_1_piece0
> 16/05/16 12:17:47 DEBUG BlockManager: Told master about block
> broadcast_1_piece0
> 16/05/16 12:17:47 DEBUG BlockManager: Put block broadcast_1_piece0 locally
> took 1 ms
> 16/05/16 12:17:47 DEBUG BlockManager: Putting block broadcast_1_piece0
> without replication took 2 ms
> 16/05/16 12:17:47 INFO SparkContext: Created broadcast 1 from broadcast at
> DAGScheduler.scala:1012
> 16/05/16 12:17:47 INFO DAGScheduler: Submitting 1 missing tasks from
> ResultStage 1 (MapPartitionsRDD[5] at collect at <stdin>:1)
> 16/05/16 12:17:47 DEBUG DAGScheduler: New pending partitions: Set(0)
> 16/05/16 12:17:47 INFO TaskSchedulerImpl: Adding task set 1.0 with 1 tasks
> 16/05/16 12:17:47 DEBUG TaskSetManager: Epoch for TaskSet 1.0: 0
> 16/05/16 12:17:47 DEBUG TaskSetManager: Valid locality levels for TaskSet
> 1.0: NO_PREF, ANY
> 16/05/16 12:17:47 DEBUG TaskSchedulerImpl: parentName: , name: TaskSet_1,
> runningTasks: 0
> 16/05/16 12:17:47 INFO TaskSetManager: Starting task 0.0 in stage 1.0 (TID 1,
> xxx3, partition 0, PROCESS_LOCAL, 5542 bytes)
> 16/05/16 12:17:47 DEBUG TaskSetManager: No tasks for locality level NO_PREF,
> so moving to locality level ANY
> 16/05/16 12:17:47 INFO SparkDeploySchedulerBackend: Launching task 1 on
> executor id: 0 hostname: xxx3.
> 16/05/16 12:17:48 DEBUG TaskSchedulerImpl: parentName: , name: TaskSet_1,
> runningTasks: 1
> 16/05/16 12:17:48 DEBUG BlockManager: Getting local block broadcast_1_piece0
> as bytes
> 16/05/16 12:17:48 DEBUG BlockManager: Level for block broadcast_1_piece0 is
> StorageLevel(disk=true, memory=true, offheap=false, deserialized=false,
> replication=1)
> 16/05/16 12:17:48 INFO BlockManagerInfo: Added broadcast_1_piece0 in memory
> on 188.165.13.158:53616 (size: 1856.0 B, free: 14.8 GB)
> 16/05/16 12:17:49 DEBUG TaskSchedulerImpl: parentName: , name: TaskSet_1,
> runningTasks: 1
> 16/05/16 12:17:50 DEBUG TaskSchedulerImpl: parentName: , name: TaskSet_1,
> runningTasks: 1
> 16/05/16 12:17:50 DEBUG TaskSchedulerImpl: parentName: , name: TaskSet_1,
> runningTasks: 0
> 16/05/16 12:17:50 INFO TaskSetManager: Finished task 0.0 in stage 1.0 (TID 1)
> in 2156 ms on xxx3 (1/1)
> 16/05/16 12:17:50 INFO TaskSchedulerImpl: Removed TaskSet 1.0, whose tasks
> have all completed, from pool
> 16/05/16 12:17:50 INFO DAGScheduler: ResultStage 1 (collect at <stdin>:1)
> finished in 2.158 s
> 16/05/16 12:17:50 DEBUG DAGScheduler: After removal of stage 1, remaining
> stages = 0
> 16/05/16 12:17:50 INFO DAGScheduler: Job 1 finished: collect at <stdin>:1,
> took 2.174808 s
> {code}
> I can't see any information on Hive connection in this trace.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]