[ https://issues.apache.org/jira/browse/SPARK-34298?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17279006#comment-17279006 ]
cornel creanga commented on SPARK-34298: ---------------------------------------- Thanks for the answer. In this wouldn't be better to implement the option a) - throw an explicit error with a meaningful message(eg root dirs are not supported in overwrite mode etc) when trying to use a root dir? Right now one will get an java.lang.IndexOutOfBoundsException and will have to dig into the Spark code in order to understand what's the problem (as it happened to me). > SaveMode.Overwrite not usable when using s3a root paths > -------------------------------------------------------- > > Key: SPARK-34298 > URL: https://issues.apache.org/jira/browse/SPARK-34298 > Project: Spark > Issue Type: Bug > Components: Spark Core > Affects Versions: 3.1.2 > Reporter: cornel creanga > Priority: Minor > > SaveMode.Overwrite does not work when using paths containing just the root eg > "s3a://peakhour-report". To reproduce the issue (an s3 bucket + credentials > are needed): > {color:#0033b3}val {color}{color:#000000}out {color}= > {color:#067d17}"s3a://peakhour-report"{color} > {color:#0033b3}val {color}{color:#000000}sparkContext{color}: > {color:#000000}SparkContext {color}= > {color:#000000}SparkContext{color}.getOrCreate() > {color:#0033b3}val {color}{color:#000000}someData {color}= > {color:#871094}Seq{color}(Row({color:#1750eb}24{color}, > {color:#067d17}"mouse"{color})) > {color:#0033b3}val {color}{color:#000000}someSchema {color}= > {color:#871094}List{color}(StructField({color:#067d17}"age"{color}, > {color:#000000}IntegerType{color}, > {color:#0033b3}true{color}),StructField({color:#067d17}"word"{color}, > {color:#000000}StringType{color},{color:#0033b3}true{color})) > {color:#0033b3}val {color}{color:#000000}someDF {color}= > {color:#871094}spark{color}.createDataFrame( > > {color:#871094}spark{color}.sparkContext.parallelize({color:#000000}someData{color}),StructType({color:#000000}someSchema{color})) > {color:#000000}sparkContext{color}.hadoopConfiguration.set({color:#067d17}"fs.s3a.access.key"{color}, > accessK{color:#000000}ey{color})) > {color:#000000}sparkContext{color}.hadoopConfiguration.set({color:#067d17}"fs.s3a.secret.key"{color}, > {color:#000000}secretKey{color})) > {color:#000000}sparkContext{color}.hadoopConfiguration.set({color:#067d17}"fs.s3a.aws.credentials.provider"{color}, > > {color:#067d17}"org.apache.hadoop.fs.s3a.SimpleAWSCredentialsProvider"{color}) > {color:#000000}sparkContext{color}.hadoopConfiguration.set({color:#067d17}"fs.s3a.impl"{color}, > {color:#067d17}"org.apache.hadoop.fs.s3a.S3AFileSystem"{color}) > {color:#000000}someDF{color}.write.format({color:#067d17}"parquet"{color}).partitionBy({color:#067d17}"age"{color}).mode({color:#000000}SaveMode{color}.{color:#871094}Overwrite{color}) > .save({color:#000000}out{color}) > > Error stacktrace: > Exception in thread "main" java.lang.IllegalArgumentException: Can not create > a Path from an empty string > at org.apache.hadoop.fs.Path.checkPathArg(Path.java:168)[....] > at org.apache.hadoop.fs.Path.suffix(Path.java:446) > at > org.apache.spark.sql.execution.datasources.InsertIntoHadoopFsRelationCommand.deleteMatchingPartitions(InsertIntoHadoopFsRelationCommand.scala:240) > > If you change out from {color:#0033b3}val {color}{color:#000000}out {color}= > {color:#067d17}"s3a://peakhour-report"{color} to {color:#0033b3}val > {color}{color:#000000}out {color}= > {color:#067d17}"s3a://peakhour-report/folder" {color:#172b4d}the code > works.{color}{color} > {color:#067d17}{color:#172b4d}There are two problems in the actual code from > InsertIntoHadoopFsRelationCommand.deleteMatchingPartitions: {color}{color} > {color:#067d17}{color:#172b4d}a) it uses org.apache.hadoop.fs.Path.suffix > method that doesn't work on root paths > {color}{color} > {color:#067d17}{color:#172b4d}b) it tries to delete the root folder directly > (in our case the s3 bucket name) and this is prohibited (in the S3AFileSystem > class){color}{color} > {color:#067d17}{color:#172b4d}I think that there are two > choices:{color}{color} > {color:#067d17}{color:#172b4d}a) throw an explicit error when using overwrite > mode for root folders {color}{color} > {color:#067d17}{color:#172b4d}b)fix the actual issue. don't use the > Path.suffix method and change the clean up code from > InsertIntoHadoopFsRelationCommand.deleteMatchingPartitions to list the root > folder content and delete the entries one by one.{color}{color} > I can provide a patch for both choices, assuming that they make sense. -- This message was sent by Atlassian Jira (v8.3.4#803005) --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org