[jira] [Commented] (SPARK-22967) VersionSuite failed on Windows caused by unescapeSQLString()

Hyukjin Kwon (JIRA) Mon, 08 Jan 2018 05:14:44 -0800

    [ 
https://issues.apache.org/jira/browse/SPARK-22967?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16316278#comment-16316278
 ]


Hyukjin Kwon commented on SPARK-22967:
--------------------------------------

Hm .. I haven't taken a look for it closely yet but can we ignore it in that 
case in anyway? If it's difficult or impossible, I think we can just skip on 
Windows as it's going to fail anyway. I did this several times. For example, I 
think you can refer https://github.com/apache/spark/pull/16999.

I know it take a whole to investigate to check if other tests are failed on 
Windows but it should be really nicer if we can identify some more test cases 
failed on Windows fix them in a batch.

> VersionSuite failed on Windows caused by unescapeSQLString()
> ------------------------------------------------------------
>
>                 Key: SPARK-22967
>                 URL: https://issues.apache.org/jira/browse/SPARK-22967
>             Project: Spark
>          Issue Type: Bug
>          Components: SQL
>    Affects Versions: 2.2.1
>         Environment: Windos7
>            Reporter: wuyi
>            Priority: Minor
>              Labels: build, test, windows
>
> On Windows system, two unit test case would fail while running VersionSuite 
> ("A simple set of tests that call the methods of a `HiveClient`, loading 
> different version of hive from maven central.")
> Failed A : test(s"$version: read avro file containing decimal") 
> {code:java}
> org.apache.hadoop.hive.ql.metadata.HiveException: 
> MetaException(message:java.lang.IllegalArgumentException: Can not create a 
> Path from an empty string);
> {code}
> Failed B: test(s"$version: SPARK-17920: Insert into/overwrite avro table")
> {code:java}
> Unable to infer the schema. The schema specification is required to create 
> the table `default`.`tab2`.;
> org.apache.spark.sql.AnalysisException: Unable to infer the schema. The 
> schema specification is required to create the table `default`.`tab2`.;
> {code}
> As I deep into this problem, I found it is related to 
> ParserUtils#unescapeSQLString().
> These are two lines at the beginning of Failed A:
> {code:java}
> val url = 
> Thread.currentThread().getContextClassLoader.getResource("avroDecimal")
> val location = new File(url.getFile)
> {code}
> And in my environment，`location` (path value) is
> {code:java}
> D:\workspace\IdeaProjects\spark\sql\hive\target\scala-2.11\test-classes\avroDecimal
> {code}
> And then, in SparkSqlParser#visitCreateHiveTable()#L1128:
> {code:java}
> val location = Option(ctx.locationSpec).map(visitLocationSpec)
> {code}
> This line want to get LocationSepcContext's content first, which is equal to 
> `location` above.
> Then, the content is passed to visitLocationSpec(), and passed to 
> unescapeSQLString()
> finally.
> Lets' have a look at unescapeSQLString():
> {code:java}
> /** Unescape baskslash-escaped string enclosed by quotes. */
>   def unescapeSQLString(b: String): String = {
>     var enclosure: Character = null
>     val sb = new StringBuilder(b.length())
>     def appendEscapedChar(n: Char) {
>       n match {
>         case '0' => sb.append('\u0000')
>         case '\'' => sb.append('\'')
>         case '"' => sb.append('\"')
>         case 'b' => sb.append('\b')
>         case 'n' => sb.append('\n')
>         case 'r' => sb.append('\r')
>         case 't' => sb.append('\t')
>         case 'Z' => sb.append('\u001A')
>         case '\\' => sb.append('\\')
>         // The following 2 lines are exactly what MySQL does TODO: why do we 
> do this?
>         case '%' => sb.append("\\%")
>         case '_' => sb.append("\\_")
>         case _ => sb.append(n)
>       }
>     }
>     var i = 0
>     val strLength = b.length
>     while (i < strLength) {
>       val currentChar = b.charAt(i)
>       if (enclosure == null) {
>         if (currentChar == '\'' || currentChar == '\"') {
>           enclosure = currentChar
>         }
>       } else if (enclosure == currentChar) {
>         enclosure = null
>       } else if (currentChar == '\\') {
>         if ((i + 6 < strLength) && b.charAt(i + 1) == 'u') {
>           // \u0000 style character literals.
>           val base = i + 2
>           val code = (0 until 4).foldLeft(0) { (mid, j) =>
>             val digit = Character.digit(b.charAt(j + base), 16)
>             (mid << 4) + digit
>           }
>           sb.append(code.asInstanceOf[Char])
>           i += 5
>         } else if (i + 4 < strLength) {
>           // \000 style character literals.
>           val i1 = b.charAt(i + 1)
>           val i2 = b.charAt(i + 2)
>           val i3 = b.charAt(i + 3)
>           if ((i1 >= '0' && i1 <= '1') && (i2 >= '0' && i2 <= '7') && (i3 >= 
> '0' && i3 <= '7')) {
>             val tmp = ((i3 - '0') + ((i2 - '0') << 3) + ((i1 - '0') << 
> 6)).asInstanceOf[Char]
>             sb.append(tmp)
>             i += 3
>           } else {
>             appendEscapedChar(i1)
>             i += 1
>           }
>         } else if (i + 2 < strLength) {
>           // escaped character literals.
>           val n = b.charAt(i + 1)
>           appendEscapedChar(n)
>           i += 1
>         }
>       } else {
>         // non-escaped character literals.
>         sb.append(currentChar)
>       }
>       i += 1
>     }
>     sb.toString()
>   }
> {code}
>  Again, here, variable `b` is equal to content and `location`, is valued of 
> {code:java}
> D:\workspace\IdeaProjects\spark\sql\hive\target\scala-2.11\test-classes\avroDecimal
> {code}
> And we can make sense from the unescapeSQLString()' strategies that it 
> transform  the String "\t" into a escape character '\t' and remove all 
> backslashes.
> So, our original correct location resulted in:
> {code:java}
> D:workspaceIdeaProjectssparksqlhive\targetscala-2.11\test-classesavroDecimal
> {code}
>  after unescapeSQLString() completed.
> Note that, here, [ \t ] is no longer a string, but a escape character. 
> Then, return into SparkSqlParser#visitCreateHiveTable(), and move to L1134:
> {code:java}
> val locUri = location.map(CatalogUtils.stringToURI(_))
> {code}
> `location` is passed to stringToURI(), and resulted in:
> {code:java}
> file:/D:workspaceIdeaProjectssparksqlhive%09argetscala-2.11%09est-classesavroDecimal
> {code}
> finally, as  escape character '\t'  is transformed into URI code '%09'.
> Although, I'm not clearly about how this wrong path directly caused that 
> exception, as I almostly know nothing about Hive, I can verify that this 
> wrong path is the real factor to cause this exception.
> When I append these lines(in order to fix the wrong path) after 
> HiveExternalCatalog#doCreateTable()Line236-240:
> {code:java}
> if (tableLocation.get.getPath.startsWith("/D")) {
>      tableLocation = Some(CatalogUtils.stringToURI(
>         
> "file:/D:/workspace/IdeaProjects/spark/sql/hive/target/scala-2.11/test-classes/avroDecimal"))
>     }
> {code}
>  
> then, failed unit test A will pass, excluding test B.
> And below is the stack trace of the Exception:
> {code:java}
> org.apache.hadoop.hive.ql.metadata.HiveException: 
> MetaException(message:java.lang.IllegalArgumentException: Can not create a 
> Path from an empty string)
>       at org.apache.hadoop.hive.ql.metadata.Hive.createTable(Hive.java:602)
>       at 
> org.apache.spark.sql.hive.client.HiveClientImpl$$anonfun$createTable$1.apply$mcV$sp(HiveClientImpl.scala:469)
>       at 
> org.apache.spark.sql.hive.client.HiveClientImpl$$anonfun$createTable$1.apply(HiveClientImpl.scala:467)
>       at 
> org.apache.spark.sql.hive.client.HiveClientImpl$$anonfun$createTable$1.apply(HiveClientImpl.scala:467)
>       at 
> org.apache.spark.sql.hive.client.HiveClientImpl$$anonfun$withHiveState$1.apply(HiveClientImpl.scala:273)
>       at 
> org.apache.spark.sql.hive.client.HiveClientImpl.liftedTree1$1(HiveClientImpl.scala:210)
>       at 
> org.apache.spark.sql.hive.client.HiveClientImpl.retryLocked(HiveClientImpl.scala:209)
>       at 
> org.apache.spark.sql.hive.client.HiveClientImpl.withHiveState(HiveClientImpl.scala:256)
>       at 
> org.apache.spark.sql.hive.client.HiveClientImpl.createTable(HiveClientImpl.scala:467)
>       at 
> org.apache.spark.sql.hive.HiveExternalCatalog$$anonfun$doCreateTable$1.apply$mcV$sp(HiveExternalCatalog.scala:263)
>       at 
> org.apache.spark.sql.hive.HiveExternalCatalog$$anonfun$doCreateTable$1.apply(HiveExternalCatalog.scala:216)
>       at 
> org.apache.spark.sql.hive.HiveExternalCatalog$$anonfun$doCreateTable$1.apply(HiveExternalCatalog.scala:216)
>       at 
> org.apache.spark.sql.hive.HiveExternalCatalog.withClient(HiveExternalCatalog.scala:97)
>       at 
> org.apache.spark.sql.hive.HiveExternalCatalog.doCreateTable(HiveExternalCatalog.scala:216)
>       at 
> org.apache.spark.sql.catalyst.catalog.ExternalCatalog.createTable(ExternalCatalog.scala:119)
>       at 
> org.apache.spark.sql.catalyst.catalog.SessionCatalog.createTable(SessionCatalog.scala:304)
>       at 
> org.apache.spark.sql.execution.command.CreateTableCommand.run(tables.scala:128)
>       at 
> org.apache.spark.sql.execution.command.ExecutedCommandExec.sideEffectResult$lzycompute(commands.scala:70)
>       at 
> org.apache.spark.sql.execution.command.ExecutedCommandExec.sideEffectResult(commands.scala:68)
>       at 
> org.apache.spark.sql.execution.command.ExecutedCommandExec.executeCollect(commands.scala:79)
>       at org.apache.spark.sql.Dataset$$anonfun$6.apply(Dataset.scala:186)
>       at org.apache.spark.sql.Dataset$$anonfun$6.apply(Dataset.scala:186)
>       at org.apache.spark.sql.Dataset$$anonfun$51.apply(Dataset.scala:3196)
>       at 
> org.apache.spark.sql.execution.SQLExecution$.withNewExecutionId(SQLExecution.scala:77)
>       at org.apache.spark.sql.Dataset.withAction(Dataset.scala:3195)
>       at org.apache.spark.sql.Dataset.<init>(Dataset.scala:186)
>       at org.apache.spark.sql.Dataset$.ofRows(Dataset.scala:71)
>       at org.apache.spark.sql.SparkSession.sql(SparkSession.scala:638)
>       at org.apache.spark.sql.SQLContext.sql(SQLContext.scala:694)
>       at 
> org.apache.spark.sql.hive.client.VersionsSuite$$anonfun$6$$anonfun$apply$24$$anonfun$apply$mcV$sp$3.apply$mcV$sp(VersionsSuite.scala:829)
>       at 
> org.apache.spark.sql.hive.client.VersionsSuite.withTable(VersionsSuite.scala:70)
>       at 
> org.apache.spark.sql.hive.client.VersionsSuite$$anonfun$6$$anonfun$apply$24.apply$mcV$sp(VersionsSuite.scala:828)
>       at 
> org.apache.spark.sql.hive.client.VersionsSuite$$anonfun$6$$anonfun$apply$24.apply(VersionsSuite.scala:805)
>       at 
> org.apache.spark.sql.hive.client.VersionsSuite$$anonfun$6$$anonfun$apply$24.apply(VersionsSuite.scala:805)
>       at org.scalatest.OutcomeOf$class.outcomeOf(OutcomeOf.scala:85)
>       at org.scalatest.OutcomeOf$.outcomeOf(OutcomeOf.scala:104)
>       at org.scalatest.Transformer.apply(Transformer.scala:22)
>       at org.scalatest.Transformer.apply(Transformer.scala:20)
>       at org.scalatest.FunSuiteLike$$anon$1.apply(FunSuiteLike.scala:186)
>       at org.apache.spark.SparkFunSuite.withFixture(SparkFunSuite.scala:68)
>       at 
> org.scalatest.FunSuiteLike$class.invokeWithFixture$1(FunSuiteLike.scala:183)
>       at 
> org.scalatest.FunSuiteLike$$anonfun$runTest$1.apply(FunSuiteLike.scala:196)
>       at 
> org.scalatest.FunSuiteLike$$anonfun$runTest$1.apply(FunSuiteLike.scala:196)
>       at org.scalatest.SuperEngine.runTestImpl(Engine.scala:289)
>       at org.scalatest.FunSuiteLike$class.runTest(FunSuiteLike.scala:196)
>       at org.scalatest.FunSuite.runTest(FunSuite.scala:1560)
>       at 
> org.scalatest.FunSuiteLike$$anonfun$runTests$1.apply(FunSuiteLike.scala:229)
>       at 
> org.scalatest.FunSuiteLike$$anonfun$runTests$1.apply(FunSuiteLike.scala:229)
>       at 
> org.scalatest.SuperEngine$$anonfun$traverseSubNodes$1$1.apply(Engine.scala:396)
>       at 
> org.scalatest.SuperEngine$$anonfun$traverseSubNodes$1$1.apply(Engine.scala:384)
>       at scala.collection.immutable.List.foreach(List.scala:381)
>       at org.scalatest.SuperEngine.traverseSubNodes$1(Engine.scala:384)
>       at 
> org.scalatest.SuperEngine.org$scalatest$SuperEngine$$runTestsInBranch(Engine.scala:379)
>       at org.scalatest.SuperEngine.runTestsImpl(Engine.scala:461)
>       at org.scalatest.FunSuiteLike$class.runTests(FunSuiteLike.scala:229)
>       at org.scalatest.FunSuite.runTests(FunSuite.scala:1560)
>       at org.scalatest.Suite$class.run(Suite.scala:1147)
>       at 
> org.scalatest.FunSuite.org$scalatest$FunSuiteLike$$super$run(FunSuite.scala:1560)
>       at 
> org.scalatest.FunSuiteLike$$anonfun$run$1.apply(FunSuiteLike.scala:233)
>       at 
> org.scalatest.FunSuiteLike$$anonfun$run$1.apply(FunSuiteLike.scala:233)
>       at org.scalatest.SuperEngine.runImpl(Engine.scala:521)
>       at org.scalatest.FunSuiteLike$class.run(FunSuiteLike.scala:233)
>       at 
> org.apache.spark.SparkFunSuite.org$scalatest$BeforeAndAfterAll$$super$run(SparkFunSuite.scala:31)
>       at 
> org.scalatest.BeforeAndAfterAll$class.liftedTree1$1(BeforeAndAfterAll.scala:213)
>       at 
> org.scalatest.BeforeAndAfterAll$class.run(BeforeAndAfterAll.scala:210)
>       at org.apache.spark.SparkFunSuite.run(SparkFunSuite.scala:31)
>       at org.scalatest.tools.SuiteRunner.run(SuiteRunner.scala:45)
>       at 
> org.scalatest.tools.Runner$$anonfun$doRunRunRunDaDoRunRun$1.apply(Runner.scala:1340)
>       at 
> org.scalatest.tools.Runner$$anonfun$doRunRunRunDaDoRunRun$1.apply(Runner.scala:1334)
>       at scala.collection.immutable.List.foreach(List.scala:381)
>       at org.scalatest.tools.Runner$.doRunRunRunDaDoRunRun(Runner.scala:1334)
>       at 
> org.scalatest.tools.Runner$$anonfun$runOptionallyWithPassFailReporter$2.apply(Runner.scala:1011)
>       at 
> org.scalatest.tools.Runner$$anonfun$runOptionallyWithPassFailReporter$2.apply(Runner.scala:1010)
>       at 
> org.scalatest.tools.Runner$.withClassLoaderAndDispatchReporter(Runner.scala:1500)
>       at 
> org.scalatest.tools.Runner$.runOptionallyWithPassFailReporter(Runner.scala:1010)
>       at org.scalatest.tools.Runner$.run(Runner.scala:850)
>       at org.scalatest.tools.Runner.run(Runner.scala)
>       at 
> org.jetbrains.plugins.scala.testingSupport.scalaTest.ScalaTestRunner.runScalaTest2(ScalaTestRunner.java:138)
>       at 
> org.jetbrains.plugins.scala.testingSupport.scalaTest.ScalaTestRunner.main(ScalaTestRunner.java:28)
> Caused by: MetaException(message:java.lang.IllegalArgumentException: Can not 
> create a Path from an empty string)
>       at 
> org.apache.hadoop.hive.metastore.HiveMetaStore$HMSHandler.create_table_with_environment_context(HiveMetaStore.java:1121)
>       at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>       at 
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
>       at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
>       at java.lang.reflect.Method.invoke(Method.java:498)
>       at 
> org.apache.hadoop.hive.metastore.RetryingHMSHandler.invoke(RetryingHMSHandler.java:103)
>       at com.sun.proxy.$Proxy31.create_table_with_environment_context(Unknown 
> Source)
>       at 
> org.apache.hadoop.hive.metastore.HiveMetaStoreClient.createTable(HiveMetaStoreClient.java:482)
>       at 
> org.apache.hadoop.hive.metastore.HiveMetaStoreClient.createTable(HiveMetaStoreClient.java:471)
>       at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>       at 
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
>       at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
>       at java.lang.reflect.Method.invoke(Method.java:498)
>       at 
> org.apache.hadoop.hive.metastore.RetryingMetaStoreClient.invoke(RetryingMetaStoreClient.java:89)
>       at com.sun.proxy.$Proxy32.createTable(Unknown Source)
>       at org.apache.hadoop.hive.ql.metadata.Hive.createTable(Hive.java:596)
>       ... 78 more
> Caused by: java.lang.IllegalArgumentException: Can not create a Path from an 
> empty string
>       at org.apache.hadoop.fs.Path.checkPathArg(Path.java:127)
>       at org.apache.hadoop.fs.Path.<init>(Path.java:184)
>       at org.apache.hadoop.fs.Path.getParent(Path.java:357)
>       at 
> org.apache.hadoop.fs.RawLocalFileSystem.mkdirs(RawLocalFileSystem.java:427)
>       at 
> org.apache.hadoop.fs.ChecksumFileSystem.mkdirs(ChecksumFileSystem.java:690)
>       at org.apache.hadoop.hive.metastore.Warehouse.mkdirs(Warehouse.java:194)
>       at 
> org.apache.hadoop.hive.metastore.HiveMetaStore$HMSHandler.create_table_core(HiveMetaStore.java:1059)
>       at 
> org.apache.hadoop.hive.metastore.HiveMetaStore$HMSHandler.create_table_with_environment_context(HiveMetaStore.java:1107)
>       ... 93 more
> {code}
> As for test B, I did'n do a careful inspection, but I find a same wrong path 
> as test A. So, I guess exceptions were  caused by the same factor.
>  



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[jira] [Commented] (SPARK-22967) VersionSuite failed on Windows caused by unescapeSQLString()

Reply via email to