Repository: spark Updated Branches: refs/heads/branch-2.0 9ff03fa23 -> bdd27d1ea
[SPARK-17783][SQL][BACKPORT-2.0] Hide Credentials in CREATE and DESC FORMATTED/EXTENDED a PERSISTENT/TEMP Table for JDBC ### What changes were proposed in this pull request? This PR is to backport https://github.com/apache/spark/pull/15358 to Spark 2.0. ------ We should never expose the Credentials in the EXPLAIN and DESC FORMATTED/EXTENDED command. However, below commands exposed the credentials. In the related PR: https://github.com/apache/spark/pull/10452 > URL patterns to specify credential seems to be vary between different > databases. Thus, we hide the whole `url` value if it contains the keyword `password`. We also hide the `password` property. Before the fix, the command outputs look like: ``` SQL CREATE TABLE tab1 USING org.apache.spark.sql.jdbc OPTIONS ( url 'jdbc:h2:mem:testdb0;user=testUser;password=testPass', dbtable 'TEST.PEOPLE', user 'testUser', password '$password') DESC FORMATTED tab1 DESC EXTENDED tab1 ``` Before the fix, - The output of SQL statement EXPLAIN ``` == Physical Plan == ExecutedCommand +- CreateDataSourceTableCommand CatalogTable( Table: `tab1` Created: Wed Nov 16 23:00:10 PST 2016 Last Access: Wed Dec 31 15:59:59 PST 1969 Type: MANAGED Provider: org.apache.spark.sql.jdbc Storage(Properties: [url=jdbc:h2:mem:testdb0;user=testUser;password=testPass, dbtable=TEST.PEOPLE, user=testUser, password=testPass])), false ``` - The output of `DESC FORMATTED` ``` ... |Storage Desc Parameters: | | | | url |jdbc:h2:mem:testdb0;user=testUser;password=testPass | | | dbtable |TEST.PEOPLE | | | user |testUser | | | password |testPass | | +----------------------------+------------------------------------------------------------------+-------+ ``` - The output of `DESC EXTENDED` ``` |# Detailed Table Information|CatalogTable( Table: `default`.`tab1` Created: Wed Nov 16 23:00:10 PST 2016 Last Access: Wed Dec 31 15:59:59 PST 1969 Type: MANAGED Schema: [StructField(NAME,StringType,false), StructField(THEID,IntegerType,false)] Provider: org.apache.spark.sql.jdbc Storage(Location: file:/Users/xiaoli/IdeaProjects/sparkDelivery/spark-warehouse/tab1, Properties: [url=jdbc:h2:mem:testdb0;user=testUser;password=testPass, dbtable=TEST.PEOPLE, user=testUser, password=testPass]))| | ``` After the fix, - The output of SQL statement EXPLAIN ``` == Physical Plan == ExecutedCommand +- CreateDataSourceTableCommand CatalogTable( Table: `tab1` Created: Wed Nov 16 22:43:49 PST 2016 Last Access: Wed Dec 31 15:59:59 PST 1969 Type: MANAGED Provider: org.apache.spark.sql.jdbc Storage(Properties: [url=###, dbtable=TEST.PEOPLE, user=testUser, password=###])), false ``` - The output of `DESC FORMATTED` ``` ... |Storage Desc Parameters: | | | | url |### | | | dbtable |TEST.PEOPLE | | | user |testUser | | | password |### | | +----------------------------+------------------------------------------------------------------+-------+ ``` - The output of `DESC EXTENDED` ``` |# Detailed Table Information|CatalogTable( Table: `default`.`tab1` Created: Wed Nov 16 22:43:49 PST 2016 Last Access: Wed Dec 31 15:59:59 PST 1969 Type: MANAGED Schema: [StructField(NAME,StringType,false), StructField(THEID,IntegerType,false)] Provider: org.apache.spark.sql.jdbc Storage(Location: file:/Users/xiaoli/IdeaProjects/sparkDelivery/spark-warehouse/tab1, Properties: [url=###, dbtable=TEST.PEOPLE, user=testUser, password=###]))| | ``` ### How was this patch tested? Added test cases Author: gatorsmile <[email protected]> Closes #16047 from gatorsmile/backPortSPARK-17783. Project: http://git-wip-us.apache.org/repos/asf/spark/repo Commit: http://git-wip-us.apache.org/repos/asf/spark/commit/bdd27d1e Tree: http://git-wip-us.apache.org/repos/asf/spark/tree/bdd27d1e Diff: http://git-wip-us.apache.org/repos/asf/spark/diff/bdd27d1e Branch: refs/heads/branch-2.0 Commit: bdd27d1eae023b19d943f6445e83dfd92967ca8d Parents: 9ff03fa Author: gatorsmile <[email protected]> Authored: Tue Nov 29 02:59:03 2016 -0800 Committer: Herman van Hovell <[email protected]> Committed: Tue Nov 29 02:59:03 2016 -0800 ---------------------------------------------------------------------- .../sql/catalyst/catalog/CatalogUtils.scala | 33 ++++++++++++++++++++ .../spark/sql/catalyst/catalog/interface.scala | 10 +++--- .../command/createDataSourceTables.scala | 17 ++++++++++ .../spark/sql/execution/command/tables.scala | 3 +- .../spark/sql/execution/datasources/ddl.scala | 30 +++++++++++++++++- .../org/apache/spark/sql/jdbc/JDBCSuite.scala | 28 +++++++++++++++++ 6 files changed, 113 insertions(+), 8 deletions(-) ---------------------------------------------------------------------- http://git-wip-us.apache.org/repos/asf/spark/blob/bdd27d1e/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/catalog/CatalogUtils.scala ---------------------------------------------------------------------- diff --git a/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/catalog/CatalogUtils.scala b/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/catalog/CatalogUtils.scala new file mode 100644 index 0000000..1046f79 --- /dev/null +++ b/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/catalog/CatalogUtils.scala @@ -0,0 +1,33 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more + * contributor license agreements. See the NOTICE file distributed with + * this work for additional information regarding copyright ownership. + * The ASF licenses this file to You under the Apache License, Version 2.0 + * (the "License"); you may not use this file except in compliance with + * the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +package org.apache.spark.sql.catalyst.catalog + +object CatalogUtils { + /** + * Masking credentials in the option lists. For example, in the sql plan explain output + * for JDBC data sources. + */ + def maskCredentials(options: Map[String, String]): Map[String, String] = { + options.map { + case (key, _) if key.toLowerCase == "password" => (key, "###") + case (key, value) if key.toLowerCase == "url" && value.toLowerCase.contains("password") => + (key, "###") + case o => o + } + } +} http://git-wip-us.apache.org/repos/asf/spark/blob/bdd27d1e/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/catalog/interface.scala ---------------------------------------------------------------------- diff --git a/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/catalog/interface.scala b/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/catalog/interface.scala index 8342892..9f221a8 100644 --- a/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/catalog/interface.scala +++ b/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/catalog/interface.scala @@ -52,12 +52,10 @@ case class CatalogStorageFormat( serdeProperties: Map[String, String]) { override def toString: String = { - val serdePropsToString = - if (serdeProperties.nonEmpty) { - s"Properties: " + serdeProperties.map(p => p._1 + "=" + p._2).mkString("[", ", ", "]") - } else { - "" - } + val serdePropsToString = CatalogUtils.maskCredentials(serdeProperties) match { + case props if props.isEmpty => "" + case props => "Properties: " + props.map(p => p._1 + "=" + p._2).mkString("[", ", ", "]") + } val output = Seq(locationUri.map("Location: " + _).getOrElse(""), inputFormat.map("InputFormat: " + _).getOrElse(""), http://git-wip-us.apache.org/repos/asf/spark/blob/bdd27d1e/sql/core/src/main/scala/org/apache/spark/sql/execution/command/createDataSourceTables.scala ---------------------------------------------------------------------- diff --git a/sql/core/src/main/scala/org/apache/spark/sql/execution/command/createDataSourceTables.scala b/sql/core/src/main/scala/org/apache/spark/sql/execution/command/createDataSourceTables.scala index de7d1fa..bbd5f46 100644 --- a/sql/core/src/main/scala/org/apache/spark/sql/execution/command/createDataSourceTables.scala +++ b/sql/core/src/main/scala/org/apache/spark/sql/execution/command/createDataSourceTables.scala @@ -32,6 +32,7 @@ import org.apache.spark.sql.execution.datasources._ import org.apache.spark.sql.internal.HiveSerDe import org.apache.spark.sql.sources.InsertableRelation import org.apache.spark.sql.types._ +import org.apache.spark.util.Utils /** * A command used to create a data source table. @@ -57,6 +58,22 @@ case class CreateDataSourceTableCommand( managedIfNoPath: Boolean) extends RunnableCommand { + override def argString: String = { + val partColumns = if (partitionColumns.isEmpty) { + "" + } else { + "partitionColumns:" + Utils.truncatedString(partitionColumns, "[", ", ", "] ") + } + s"[tableIdent:$tableIdent " + + userSpecifiedSchema.map(_ + " ").getOrElse("") + + s"provider:$provider " + + CatalogUtils.maskCredentials(options) + " " + + partColumns + + bucketSpec.map(_ + " ").getOrElse("") + + s"ignoreIfExists:$ignoreIfExists " + + s"managedIfNoPath:$managedIfNoPath]" + } + override def run(sparkSession: SparkSession): Seq[Row] = { // Since we are saving metadata to metastore, we need to check if metastore supports // the table name and database name we have for this query. MetaStoreUtils.validateName http://git-wip-us.apache.org/repos/asf/spark/blob/bdd27d1e/sql/core/src/main/scala/org/apache/spark/sql/execution/command/tables.scala ---------------------------------------------------------------------- diff --git a/sql/core/src/main/scala/org/apache/spark/sql/execution/command/tables.scala b/sql/core/src/main/scala/org/apache/spark/sql/execution/command/tables.scala index ad0c779..fb427e4 100644 --- a/sql/core/src/main/scala/org/apache/spark/sql/execution/command/tables.scala +++ b/sql/core/src/main/scala/org/apache/spark/sql/execution/command/tables.scala @@ -531,7 +531,8 @@ case class DescribeTableCommand( describeBucketingInfo(metadata, buffer) append(buffer, "Storage Desc Parameters:", "", "") - metadata.storage.serdeProperties.foreach { case (key, value) => + val maskedProperties = CatalogUtils.maskCredentials(metadata.storage.serdeProperties) + maskedProperties.foreach { case (key, value) => append(buffer, s" $key", value, "") } } http://git-wip-us.apache.org/repos/asf/spark/blob/bdd27d1e/sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/ddl.scala ---------------------------------------------------------------------- diff --git a/sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/ddl.scala b/sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/ddl.scala index 857047f..eb3ea95 100644 --- a/sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/ddl.scala +++ b/sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/ddl.scala @@ -19,11 +19,13 @@ package org.apache.spark.sql.execution.datasources import org.apache.spark.sql._ import org.apache.spark.sql.catalyst.TableIdentifier +import org.apache.spark.sql.catalyst.catalog.CatalogUtils import org.apache.spark.sql.catalyst.plans.QueryPlan import org.apache.spark.sql.catalyst.plans.logical import org.apache.spark.sql.catalyst.plans.logical.LogicalPlan import org.apache.spark.sql.execution.command.RunnableCommand import org.apache.spark.sql.types._ +import org.apache.spark.util.Utils /** @@ -41,7 +43,25 @@ case class CreateTableUsing( partitionColumns: Array[String], bucketSpec: Option[BucketSpec], allowExisting: Boolean, - managedIfNoPath: Boolean) extends logical.Command + managedIfNoPath: Boolean) extends logical.Command { + + override def argString: String = { + val partColumns = if (partitionColumns.isEmpty) { + "" + } else { + "partitionColumns:" + Utils.truncatedString(partitionColumns, "[", ", ", "]") + } + s"[tableIdent:$tableIdent " + + userSpecifiedSchema.map(_ + " ").getOrElse("") + + s"provider:$provider " + + (if (temporary) "temporary " else "persistent ") + + CatalogUtils.maskCredentials(options) + " " + + partColumns + + bucketSpec.map(_ + " ").getOrElse("") + + s"ignoreIfExists:$allowExisting " + + s"managedIfNoPath:$managedIfNoPath]" + } +} /** * A node used to support CTAS statements and saveAsTable for the data source API. @@ -71,6 +91,14 @@ case class CreateTempViewUsing( s"Temporary table '$tableIdent' should not have specified a database") } + override def argString: String = { + s"[tableIdent:$tableIdent " + + userSpecifiedSchema.map(_ + " ").getOrElse("") + + s"replace:$replace " + + s"provider:$provider " + + CatalogUtils.maskCredentials(options) + } + def run(sparkSession: SparkSession): Seq[Row] = { val dataSource = DataSource( sparkSession, http://git-wip-us.apache.org/repos/asf/spark/blob/bdd27d1e/sql/core/src/test/scala/org/apache/spark/sql/jdbc/JDBCSuite.scala ---------------------------------------------------------------------- diff --git a/sql/core/src/test/scala/org/apache/spark/sql/jdbc/JDBCSuite.scala b/sql/core/src/test/scala/org/apache/spark/sql/jdbc/JDBCSuite.scala index 50a8dfb..0fc83d3 100644 --- a/sql/core/src/test/scala/org/apache/spark/sql/jdbc/JDBCSuite.scala +++ b/sql/core/src/test/scala/org/apache/spark/sql/jdbc/JDBCSuite.scala @@ -735,6 +735,34 @@ class JDBCSuite extends SparkFunSuite } } + test("hide credentials in create and describe a persistent/temp table") { + val password = "testPass" + val tableName = "tab1" + Seq("TABLE", "TEMPORARY VIEW").foreach { tableType => + withTable(tableName) { + val df = sql( + s""" + |CREATE $tableType $tableName + |USING org.apache.spark.sql.jdbc + |OPTIONS ( + | url '$urlWithUserAndPass', + | dbtable 'TEST.PEOPLE', + | user 'testUser', + | password '$password') + """.stripMargin) + + val explain = ExplainCommand(df.queryExecution.logical, extended = true) + spark.sessionState.executePlan(explain).executedPlan.executeCollect().foreach { r => + assert(!r.toString.contains(password)) + } + + sql(s"DESC FORMATTED $tableName") + + sql(s"DESC EXTENDED $tableName") + } + } + } + test("SPARK 12941: The data type mapping for StringType to Oracle") { val oracleDialect = JdbcDialects.get("jdbc:oracle://127.0.0.1/db") assert(oracleDialect.getJDBCType(StringType). --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
