[jira] [Updated] (MAPREDUCE-7275) hadoop output to ftp gives rename error on FileOutputCommitter.mergePaths

Steve Loughran (Jira) Fri, 17 Apr 2020 07:30:23 -0700


     [ 
https://issues.apache.org/jira/browse/MAPREDUCE-7275?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]


Steve Loughran updated MAPREDUCE-7275:
--------------------------------------
    Priority: Minor  (was: Major)

> hadoop output to ftp gives rename error on FileOutputCommitter.mergePaths
> -------------------------------------------------------------------------
>
>                 Key: MAPREDUCE-7275
>                 URL: https://issues.apache.org/jira/browse/MAPREDUCE-7275
>             Project: Hadoop Map/Reduce
>          Issue Type: Bug
>          Components: mrv2
>            Reporter: Talha Azaz
>            Priority: Minor
>
> i'm using spark in kubernetes cluster mode and trying to write read data from 
> DB and write in parquet format to ftp server. I'm using hadoop ftp filesystem 
> for writing. When the task completes, it tries to rename 
> /sensor_values/1585353600000/_temporary/0/_temporary/attempt_20200414075519_0000_m_000021_21/part-00021-d7cef14e-151b-4c3b-a8d8-4e9ab33e80f9-c000.snappy.parquet
> to 
> /sensor_values/1585353600000/part-00021-d7cef14e-151b-4c3b-a8d8-4e9ab33e80f9-c000.snappy.parquet
> But the problem is it gives the following error:
> ```
> Lost task 21.0 in stage 0.0 (TID 21, 10.233.90.137, executor 3): 
> org.apache.spark.SparkException: Task failed while writing rows.
>  at 
> org.apache.spark.sql.execution.datasources.FileFormatWriter$.org$apache$spark$sql$execution$datasources$FileFormatWriter$$executeTask(FileFormatWriter.scala:257)
>  at 
> org.apache.spark.sql.execution.datasources.FileFormatWriter$$anonfun$write$1.apply(FileFormatWriter.scala:170)
>  at 
> org.apache.spark.sql.execution.datasources.FileFormatWriter$$anonfun$write$1.apply(FileFormatWriter.scala:169)
>  at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:90)
>  at org.apache.spark.scheduler.Task.run(Task.scala:123)
>  at 
> org.apache.spark.executor.Executor$TaskRunner$$anonfun$10.apply(Executor.scala:408)
>  at org.apache.spark.util.Utils$.tryWithSafeFinally(Utils.scala:1360)
>  at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:414)
>  at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
>  at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
>  at java.lang.Thread.run(Thread.java:748)
> Caused by: java.io.IOException: Cannot rename source: 
> ftp://user:pass@host/sensor_values/1585353600000/_temporary/0/_temporary/attempt_20200414075519_0000_m_000021_21/part-00021-d7cef14e-151b-4c3b-a8d8-4e9ab33e80f9-c000.snappy.parquet
>  to 
> ftp://user:pass@host/sensor_values/1585353600000/part-00021-d7cef14e-151b-4c3b-a8d8-4e9ab33e80f9-c000.snappy.parquet
>  -only same directory renames are supported
>  at org.apache.hadoop.fs.ftp.FTPFileSystem.rename(FTPFileSystem.java:674)
>  at org.apache.hadoop.fs.ftp.FTPFileSystem.rename(FTPFileSystem.java:613)
>  at 
> org.apache.hadoop.mapreduce.lib.output.FileOutputCommitter.mergePaths(FileOutputCommitter.java:472)
>  at 
> org.apache.hadoop.mapreduce.lib.output.FileOutputCommitter.mergePaths(FileOutputCommitter.java:486)
>  at 
> org.apache.hadoop.mapreduce.lib.output.FileOutputCommitter.commitTask(FileOutputCommitter.java:597)
>  at 
> org.apache.hadoop.mapreduce.lib.output.FileOutputCommitter.commitTask(FileOutputCommitter.java:560)
>  at 
> org.apache.spark.mapred.SparkHadoopMapRedUtil$.performCommit$1(SparkHadoopMapRedUtil.scala:50)
>  at 
> org.apache.spark.mapred.SparkHadoopMapRedUtil$.commitTask(SparkHadoopMapRedUtil.scala:77)
>  at 
> org.apache.spark.internal.io.HadoopMapReduceCommitProtocol.commitTask(HadoopMapReduceCommitProtocol.scala:225)
>  at 
> org.apache.spark.sql.execution.datasources.FileFormatDataWriter.commit(FileFormatDataWriter.scala:78)
>  at 
> org.apache.spark.sql.execution.datasources.FileFormatWriter$$anonfun$org$apache$spark$sql$execution$datasources$FileFormatWriter$$executeTask$3.apply(FileFormatWriter.scala:247)
>  at 
> org.apache.spark.sql.execution.datasources.FileFormatWriter$$anonfun$org$apache$spark$sql$execution$datasources$FileFormatWriter$$executeTask$3.apply(FileFormatWriter.scala:242)
>  at 
> org.apache.spark.util.Utils$.tryWithSafeFinallyAndFailureCallbacks(Utils.scala:1394)
>  at 
> org.apache.spark.sql.execution.datasources.FileFormatWriter$.org$apache$spark$sql$execution$datasources$FileFormatWriter$$executeTask(FileFormatWriter.scala:248)
>  ... 10 more
> ```
> I have done the same thing on Azure filesystem using same spark and hadoop 
> implimentation. 
> Is there any configuration in hadoop or spark that needs to be changed or is 
> it just not supported in hadoop ftp file System?
> Thanks a lot!!



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

---------------------------------------------------------------------
To unsubscribe, e-mail: mapreduce-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: mapreduce-issues-h...@hadoop.apache.org

[jira] [Updated] (MAPREDUCE-7275) hadoop output to ftp gives rename error on FileOutputCommitter.mergePaths

Reply via email to