[
https://issues.apache.org/jira/browse/HADOOP-18856?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Steve Loughran updated HADOOP-18856:
------------------------------------
Summary: Spark insertInto with location GCS bucket root not supported
(was: Spark insertInto with location GCS bucket root causes NPE)
> Spark insertInto with location GCS bucket root not supported
> ------------------------------------------------------------
>
> Key: HADOOP-18856
> URL: https://issues.apache.org/jira/browse/HADOOP-18856
> Project: Hadoop Common
> Issue Type: Bug
> Components: common
> Affects Versions: 3.3.3
> Reporter: Dipayan Dev
> Priority: Minor
>
>
> {noformat}
> scala> import org.apache.hadoop.fs.Path
> import org.apache.hadoop.fs.Path
> scala> val path: Path = new Path("gs://test_dd123/")
> path: org.apache.hadoop.fs.Path = gs://test_dd123/
> scala> path.suffix("/num=123")
> java.lang.NullPointerException
> at org.apache.hadoop.fs.Path.<init>(Path.java:150)
> at org.apache.hadoop.fs.Path.<init>(Path.java:129)
> at org.apache.hadoop.fs.Path.suffix(Path.java:450){noformat}
>
> Path.suffix throws NPE when writing into GS buckets root.
>
> In our Organisation, we are using GCS bucket root location to point to our
> Hive table. Dataproc's latest 2.1 uses *Hadoop* *3.3.3* and this needs to be
> fixed in 3.3.3.
> Spark Scala code to reproduce this issue
> {noformat}
> val DF = Seq(("test1", 123)).toDF("name", "num")
> DF.write.option("path",
> "gs://test_dd123/").mode(SaveMode.Overwrite).partitionBy("num").format("orc").saveAsTable("schema_name.table_name")
> val DF1 = Seq(("test2", 125)).toDF("name", "num")
> DF1.write.mode(SaveMode.Overwrite).format("orc").insertInto("schema_name.table_name")
> java.lang.NullPointerException
> at org.apache.hadoop.fs.Path.<init>(Path.java:141)
> at org.apache.hadoop.fs.Path.<init>(Path.java:120)
> at org.apache.hadoop.fs.Path.suffix(Path.java:441)
> at
> org.apache.spark.sql.execution.datasources.InsertIntoHadoopFsRelationCommand.$anonfun$getCustomPartitionLocations$1(InsertIntoHadoopFsRelationCommand.scala:254)
> {noformat}
>
>
--
This message was sent by Atlassian Jira
(v8.20.10#820010)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]