Talha Azaz created HADOOP-16994:
-----------------------------------
Summary: hadoop output to ftp gives rename error on
FileOutputCommitter.mergePaths
Key: HADOOP-16994
URL: https://issues.apache.org/jira/browse/HADOOP-16994
Project: Hadoop Common
Issue Type: Bug
Components: fs
Reporter: Talha Azaz
i'm using spark in kubernetes cluster mode and trying to write read data from
DB and write in parquet format to ftp server. I'm using hadoop ftp filesystem
for writing. When the task completes, it tries to rename
/sensor_values/1585353600000/_temporary/0/_temporary/attempt_20200414075519_0000_m_000021_21/part-00021-d7cef14e-151b-4c3b-a8d8-4e9ab33e80f9-c000.snappy.parquet
to
/sensor_values/1585353600000/part-00021-d7cef14e-151b-4c3b-a8d8-4e9ab33e80f9-c000.snappy.parquet
But the problem is it gives the following error:
```
Lost task 21.0 in stage 0.0 (TID 21, 10.233.90.137, executor 3):
org.apache.spark.SparkException: Task failed while writing rows.
at
org.apache.spark.sql.execution.datasources.FileFormatWriter$.org$apache$spark$sql$execution$datasources$FileFormatWriter$$executeTask(FileFormatWriter.scala:257)
at
org.apache.spark.sql.execution.datasources.FileFormatWriter$$anonfun$write$1.apply(FileFormatWriter.scala:170)
at
org.apache.spark.sql.execution.datasources.FileFormatWriter$$anonfun$write$1.apply(FileFormatWriter.scala:169)
at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:90)
at org.apache.spark.scheduler.Task.run(Task.scala:123)
at
org.apache.spark.executor.Executor$TaskRunner$$anonfun$10.apply(Executor.scala:408)
at org.apache.spark.util.Utils$.tryWithSafeFinally(Utils.scala:1360)
at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:414)
at
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
at
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
at java.lang.Thread.run(Thread.java:748)
Caused by: java.io.IOException: Cannot rename source:
ftp://user:pass@host/sensor_values/1585353600000/_temporary/0/_temporary/attempt_20200414075519_0000_m_000021_21/part-00021-d7cef14e-151b-4c3b-a8d8-4e9ab33e80f9-c000.snappy.parquet
to
ftp://user:pass@host/sensor_values/1585353600000/part-00021-d7cef14e-151b-4c3b-a8d8-4e9ab33e80f9-c000.snappy.parquet
-only same directory renames are supported
at org.apache.hadoop.fs.ftp.FTPFileSystem.rename(FTPFileSystem.java:674)
at org.apache.hadoop.fs.ftp.FTPFileSystem.rename(FTPFileSystem.java:613)
at
org.apache.hadoop.mapreduce.lib.output.FileOutputCommitter.mergePaths(FileOutputCommitter.java:472)
at
org.apache.hadoop.mapreduce.lib.output.FileOutputCommitter.mergePaths(FileOutputCommitter.java:486)
at
org.apache.hadoop.mapreduce.lib.output.FileOutputCommitter.commitTask(FileOutputCommitter.java:597)
at
org.apache.hadoop.mapreduce.lib.output.FileOutputCommitter.commitTask(FileOutputCommitter.java:560)
at
org.apache.spark.mapred.SparkHadoopMapRedUtil$.performCommit$1(SparkHadoopMapRedUtil.scala:50)
at
org.apache.spark.mapred.SparkHadoopMapRedUtil$.commitTask(SparkHadoopMapRedUtil.scala:77)
at
org.apache.spark.internal.io.HadoopMapReduceCommitProtocol.commitTask(HadoopMapReduceCommitProtocol.scala:225)
at
org.apache.spark.sql.execution.datasources.FileFormatDataWriter.commit(FileFormatDataWriter.scala:78)
at
org.apache.spark.sql.execution.datasources.FileFormatWriter$$anonfun$org$apache$spark$sql$execution$datasources$FileFormatWriter$$executeTask$3.apply(FileFormatWriter.scala:247)
at
org.apache.spark.sql.execution.datasources.FileFormatWriter$$anonfun$org$apache$spark$sql$execution$datasources$FileFormatWriter$$executeTask$3.apply(FileFormatWriter.scala:242)
at
org.apache.spark.util.Utils$.tryWithSafeFinallyAndFailureCallbacks(Utils.scala:1394)
at
org.apache.spark.sql.execution.datasources.FileFormatWriter$.org$apache$spark$sql$execution$datasources$FileFormatWriter$$executeTask(FileFormatWriter.scala:248)
... 10 more
```
I have done the same thing on Azure filesystem using same spark and hadoop
implimentation.
Is there any configuration in hadoop or spark that needs to be changed or is it
just not supported in hadoop ftp file System?
Thanks a lot!!
--
This message was sent by Atlassian Jira
(v8.3.4#803005)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]