[ 
https://issues.apache.org/jira/browse/HADOOP-18856?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Steve Loughran updated HADOOP-18856:
------------------------------------
    Summary: Spark insertInto with location GCS bucket root not supported  
(was: Spark insertInto with location GCS bucket root causes NPE)

> Spark insertInto with location GCS bucket root not supported
> ------------------------------------------------------------
>
>                 Key: HADOOP-18856
>                 URL: https://issues.apache.org/jira/browse/HADOOP-18856
>             Project: Hadoop Common
>          Issue Type: Bug
>          Components: common
>    Affects Versions: 3.3.3
>            Reporter: Dipayan Dev
>            Priority: Minor
>
>  
> {noformat}
> scala> import org.apache.hadoop.fs.Path
> import org.apache.hadoop.fs.Path
> scala> val path: Path = new Path("gs://test_dd123/")
> path: org.apache.hadoop.fs.Path = gs://test_dd123/
> scala> path.suffix("/num=123")
> java.lang.NullPointerException
>   at org.apache.hadoop.fs.Path.<init>(Path.java:150)
>   at org.apache.hadoop.fs.Path.<init>(Path.java:129)
>   at org.apache.hadoop.fs.Path.suffix(Path.java:450){noformat}
>  
> Path.suffix throws NPE when writing into GS buckets root. 
>  
> In our Organisation, we are using GCS bucket root location to point to our 
> Hive table. Dataproc's latest 2.1 uses *Hadoop* *3.3.3* and this needs to be 
> fixed in 3.3.3.
> Spark Scala code to reproduce this issue
> {noformat}
> val DF = Seq(("test1", 123)).toDF("name", "num")
> DF.write.option("path", 
> "gs://test_dd123/").mode(SaveMode.Overwrite).partitionBy("num").format("orc").saveAsTable("schema_name.table_name")
> val DF1 = Seq(("test2", 125)).toDF("name", "num")
> DF1.write.mode(SaveMode.Overwrite).format("orc").insertInto("schema_name.table_name")
> java.lang.NullPointerException
>   at org.apache.hadoop.fs.Path.<init>(Path.java:141)
>   at org.apache.hadoop.fs.Path.<init>(Path.java:120)
>   at org.apache.hadoop.fs.Path.suffix(Path.java:441)
>   at 
> org.apache.spark.sql.execution.datasources.InsertIntoHadoopFsRelationCommand.$anonfun$getCustomPartitionLocations$1(InsertIntoHadoopFsRelationCommand.scala:254)
>  {noformat}
>  
>  



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

---------------------------------------------------------------------
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org

Reply via email to