[ https://issues.apache.org/jira/browse/SPARK-19912?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Cheng Lian updated SPARK-19912: ------------------------------- Description: {{Shim_v0_13.convertFilters()}} doesn't escape string literals while generating Hive style partition predicates. The following SQL-injection-like test case illustrates this issue: {code} test("SPARK-19912") { withTable("spark_19912") { Seq( (1, "p1", "q1"), (2, "p1\" and q=\"q1", "q2") ).toDF("a", "p", "q").write.partitionBy("p", "q").saveAsTable("spark_19912") checkAnswer( spark.table("foo").filter($"p" === "p1\" and q = \"q1").select($"a"), Row(2) ) } } {code} The above test case fails like this: {noformat} [info] - spark_19912 *** FAILED *** (13 seconds, 74 milliseconds) [info] Results do not match for query: [info] Timezone: sun.util.calendar.ZoneInfo[id="America/Los_Angeles",offset=-28800000,dstSavings=3600000,useDaylight=true,transitions=185,lastRule=java.util.SimpleTimeZone[id=America/Los_Angeles,offset=-28800000,dstSavings=3600000,useDaylight=true,startYear=0,startMode=3,startMonth=2,startDay=8,startDayOfWeek=1,startTime=7200000,startTimeMode=0,endMode=3,endMonth=10,endDay=1,endDayOfWeek=1,endTime=7200000,endTimeMode=0]] [info] Timezone Env: [info] [info] == Parsed Logical Plan == [info] 'Project [unresolvedalias('a, None)] [info] +- Filter (p#27 = p1" and q = "q1) [info] +- SubqueryAlias spark_19912 [info] +- Relation[a#26,p#27,q#28] parquet [info] [info] == Analyzed Logical Plan == [info] a: int [info] Project [a#26] [info] +- Filter (p#27 = p1" and q = "q1) [info] +- SubqueryAlias spark_19912 [info] +- Relation[a#26,p#27,q#28] parquet [info] [info] == Optimized Logical Plan == [info] Project [a#26] [info] +- Filter (isnotnull(p#27) && (p#27 = p1" and q = "q1)) [info] +- Relation[a#26,p#27,q#28] parquet [info] [info] == Physical Plan == [info] *Project [a#26] [info] +- *FileScan parquet default.spark_19912[a#26,p#27,q#28] Batched: true, Format: Parquet, Location: PrunedInMemoryFileIndex[], PartitionCount: 0, PartitionFilters: [isnotnull(p#27), (p#27 = p1" and q = "q1)], PushedFilters: [], ReadSchema: struct<a:int> [info] == Results == [info] [info] == Results == [info] !== Correct Answer - 1 == == Spark Answer - 0 == [info] struct<> struct<> [info] ![2] {noformat} was: {{Shim_v0_13.convertFilters()}} doesn't escape string literals while generating Hive style partition predicates. The following SQL-injection-like test case illustrates this issue: {code} test("foo") { withTable("foo") { Seq( (1, "p1", "q1"), (2, "p1\" and q=\"q1", "q2") ).toDF("a", "p", "q").write.partitionBy("p", "q").saveAsTable("foo") checkAnswer( spark.table("foo").filter($"p" === "p1\" and q = \"q1").select($"a"), Row(2) ) } } {code} > String literals are not escaped while performing partition pruning at Hive > metastore level > ------------------------------------------------------------------------------------------ > > Key: SPARK-19912 > URL: https://issues.apache.org/jira/browse/SPARK-19912 > Project: Spark > Issue Type: Bug > Components: SQL > Affects Versions: 2.1.1, 2.2.0 > Reporter: Cheng Lian > Labels: correctness > > {{Shim_v0_13.convertFilters()}} doesn't escape string literals while > generating Hive style partition predicates. > The following SQL-injection-like test case illustrates this issue: > {code} > test("SPARK-19912") { > withTable("spark_19912") { > Seq( > (1, "p1", "q1"), > (2, "p1\" and q=\"q1", "q2") > ).toDF("a", "p", "q").write.partitionBy("p", > "q").saveAsTable("spark_19912") > checkAnswer( > spark.table("foo").filter($"p" === "p1\" and q = \"q1").select($"a"), > Row(2) > ) > } > } > {code} > The above test case fails like this: > {noformat} > [info] - spark_19912 *** FAILED *** (13 seconds, 74 milliseconds) > [info] Results do not match for query: > [info] Timezone: > sun.util.calendar.ZoneInfo[id="America/Los_Angeles",offset=-28800000,dstSavings=3600000,useDaylight=true,transitions=185,lastRule=java.util.SimpleTimeZone[id=America/Los_Angeles,offset=-28800000,dstSavings=3600000,useDaylight=true,startYear=0,startMode=3,startMonth=2,startDay=8,startDayOfWeek=1,startTime=7200000,startTimeMode=0,endMode=3,endMonth=10,endDay=1,endDayOfWeek=1,endTime=7200000,endTimeMode=0]] > [info] Timezone Env: > [info] > [info] == Parsed Logical Plan == > [info] 'Project [unresolvedalias('a, None)] > [info] +- Filter (p#27 = p1" and q = "q1) > [info] +- SubqueryAlias spark_19912 > [info] +- Relation[a#26,p#27,q#28] parquet > [info] > [info] == Analyzed Logical Plan == > [info] a: int > [info] Project [a#26] > [info] +- Filter (p#27 = p1" and q = "q1) > [info] +- SubqueryAlias spark_19912 > [info] +- Relation[a#26,p#27,q#28] parquet > [info] > [info] == Optimized Logical Plan == > [info] Project [a#26] > [info] +- Filter (isnotnull(p#27) && (p#27 = p1" and q = "q1)) > [info] +- Relation[a#26,p#27,q#28] parquet > [info] > [info] == Physical Plan == > [info] *Project [a#26] > [info] +- *FileScan parquet default.spark_19912[a#26,p#27,q#28] Batched: > true, Format: Parquet, Location: PrunedInMemoryFileIndex[], PartitionCount: > 0, PartitionFilters: [isnotnull(p#27), (p#27 = p1" and q = "q1)], > PushedFilters: [], ReadSchema: struct<a:int> > [info] == Results == > [info] > [info] == Results == > [info] !== Correct Answer - 1 == == Spark Answer - 0 == > [info] struct<> struct<> > [info] ![2] > {noformat} -- This message was sent by Atlassian JIRA (v6.3.15#6346) --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org