[ https://issues.apache.org/jira/browse/HADOOP-16994?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Steve Loughran resolved HADOOP-16994. ------------------------------------- Resolution: Won't Fix > hadoop output to ftp gives rename error on FileOutputCommitter.mergePaths > ------------------------------------------------------------------------- > > Key: HADOOP-16994 > URL: https://issues.apache.org/jira/browse/HADOOP-16994 > Project: Hadoop Common > Issue Type: Bug > Components: fs > Reporter: Talha Azaz > Priority: Major > > i'm using spark in kubernetes cluster mode and trying to write read data from > DB and write in parquet format to ftp server. I'm using hadoop ftp filesystem > for writing. When the task completes, it tries to rename > /sensor_values/1585353600000/_temporary/0/_temporary/attempt_20200414075519_0000_m_000021_21/part-00021-d7cef14e-151b-4c3b-a8d8-4e9ab33e80f9-c000.snappy.parquet > to > /sensor_values/1585353600000/part-00021-d7cef14e-151b-4c3b-a8d8-4e9ab33e80f9-c000.snappy.parquet > But the problem is it gives the following error: > ``` > Lost task 21.0 in stage 0.0 (TID 21, 10.233.90.137, executor 3): > org.apache.spark.SparkException: Task failed while writing rows. > at > org.apache.spark.sql.execution.datasources.FileFormatWriter$.org$apache$spark$sql$execution$datasources$FileFormatWriter$$executeTask(FileFormatWriter.scala:257) > at > org.apache.spark.sql.execution.datasources.FileFormatWriter$$anonfun$write$1.apply(FileFormatWriter.scala:170) > at > org.apache.spark.sql.execution.datasources.FileFormatWriter$$anonfun$write$1.apply(FileFormatWriter.scala:169) > at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:90) > at org.apache.spark.scheduler.Task.run(Task.scala:123) > at > org.apache.spark.executor.Executor$TaskRunner$$anonfun$10.apply(Executor.scala:408) > at org.apache.spark.util.Utils$.tryWithSafeFinally(Utils.scala:1360) > at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:414) > at > java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) > at > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) > at java.lang.Thread.run(Thread.java:748) > Caused by: java.io.IOException: Cannot rename source: > ftp://user:pass@host/sensor_values/1585353600000/_temporary/0/_temporary/attempt_20200414075519_0000_m_000021_21/part-00021-d7cef14e-151b-4c3b-a8d8-4e9ab33e80f9-c000.snappy.parquet > to > ftp://user:pass@host/sensor_values/1585353600000/part-00021-d7cef14e-151b-4c3b-a8d8-4e9ab33e80f9-c000.snappy.parquet > -only same directory renames are supported > at org.apache.hadoop.fs.ftp.FTPFileSystem.rename(FTPFileSystem.java:674) > at org.apache.hadoop.fs.ftp.FTPFileSystem.rename(FTPFileSystem.java:613) > at > org.apache.hadoop.mapreduce.lib.output.FileOutputCommitter.mergePaths(FileOutputCommitter.java:472) > at > org.apache.hadoop.mapreduce.lib.output.FileOutputCommitter.mergePaths(FileOutputCommitter.java:486) > at > org.apache.hadoop.mapreduce.lib.output.FileOutputCommitter.commitTask(FileOutputCommitter.java:597) > at > org.apache.hadoop.mapreduce.lib.output.FileOutputCommitter.commitTask(FileOutputCommitter.java:560) > at > org.apache.spark.mapred.SparkHadoopMapRedUtil$.performCommit$1(SparkHadoopMapRedUtil.scala:50) > at > org.apache.spark.mapred.SparkHadoopMapRedUtil$.commitTask(SparkHadoopMapRedUtil.scala:77) > at > org.apache.spark.internal.io.HadoopMapReduceCommitProtocol.commitTask(HadoopMapReduceCommitProtocol.scala:225) > at > org.apache.spark.sql.execution.datasources.FileFormatDataWriter.commit(FileFormatDataWriter.scala:78) > at > org.apache.spark.sql.execution.datasources.FileFormatWriter$$anonfun$org$apache$spark$sql$execution$datasources$FileFormatWriter$$executeTask$3.apply(FileFormatWriter.scala:247) > at > org.apache.spark.sql.execution.datasources.FileFormatWriter$$anonfun$org$apache$spark$sql$execution$datasources$FileFormatWriter$$executeTask$3.apply(FileFormatWriter.scala:242) > at > org.apache.spark.util.Utils$.tryWithSafeFinallyAndFailureCallbacks(Utils.scala:1394) > at > org.apache.spark.sql.execution.datasources.FileFormatWriter$.org$apache$spark$sql$execution$datasources$FileFormatWriter$$executeTask(FileFormatWriter.scala:248) > ... 10 more > ``` > I have done the same thing on Azure filesystem using same spark and hadoop > implimentation. > Is there any configuration in hadoop or spark that needs to be changed or is > it just not supported in hadoop ftp file System? > Thanks a lot!! -- This message was sent by Atlassian Jira (v8.3.4#803005) --------------------------------------------------------------------- To unsubscribe, e-mail: common-dev-unsubscr...@hadoop.apache.org For additional commands, e-mail: common-dev-h...@hadoop.apache.org