Hi Experts, Is there any comment on this issue?
Thanks! 2015-04-29 10:35 GMT+08:00 sam liu <samliuhad...@gmail.com>: > for IIS ftp server on Windows, seems the distcp tool always failed on the > line 'client.setFileTransferMode(FTP.BLOCK_TRANSFER_MODE)' in > hadoop/hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/fs/ftp/FTPFileSystem.java#connect() > > Opened a jira for this issue: HADOOP-11886 > > 2015-04-27 16:36 GMT+08:00 sam liu <samliuhad...@gmail.com>: > >> Hi Experts, >> >> It is really weird that DistCp could successfully get the file from >> FileZilla ftp server on Windows7, but failed from the IIS ftp server on the >> same Windows7 OS(but I can get file using wget directly: 'wget >> ftp://Viewer:passw...@hostname1.com:21/ftp_file1.txt' ). I tried several >> times, but all failed and encountered different error messages as below. >> >> Any comments? >> >> *[Success on FileZilla ftp server on Windows7]:* >> [h...@hostname2.com ~]$ hadoop distcp >> ftp://ftp:f...@hostname1.com:121/ftp_test.txt /tmp/ >> 15/04/26 22:56:20 INFO tools.DistCp: Input Options: >> DistCpOptions{atomicCommit=false, syncFolder=false, deleteMissing=false, >> ignoreFailures=false, maxMaps=20, sslConfigurationFile='null', >> copyStrategy='uniformsize', sourceFileListing=null, sourcePaths=[ >> ftp://ftp:f...@hostname1.com:121/ftp_test.txt], targetPath=/tmp, >> targetPathExists=true, preserveRawXattrs=false} >> 15/04/26 22:56:21 INFO impl.TimelineClientImpl: Timeline service address: >> http://hostname2.com:8188/ws/v1/timeline/ >> 15/04/26 22:56:21 INFO client.RMProxy: Connecting to ResourceManager at >> hostname2.com/9.32.249.181:8050 >> 15/04/26 22:56:43 INFO impl.TimelineClientImpl: Timeline service address: >> http://hostname2.com:8188/ws/v1/timeline/ >> 15/04/26 22:56:43 INFO client.RMProxy: Connecting to ResourceManager at >> hostname2.com/9.32.249.181:8050 >> 15/04/26 22:56:43 INFO mapreduce.JobSubmitter: number of splits:1 >> 15/04/26 22:56:44 INFO mapreduce.JobSubmitter: Submitting tokens for job: >> job_1429858372957_0002 >> 15/04/26 22:56:44 INFO impl.YarnClientImpl: Submitted application >> application_1429858372957_0002 >> 15/04/26 22:56:44 INFO mapreduce.Job: The url to track the job: >> http://hostname2.com:8088/proxy/application_1429858372957_0002/ >> 15/04/26 22:56:44 INFO tools.DistCp: DistCp job-id: job_1429858372957_0002 >> 15/04/26 22:56:44 INFO mapreduce.Job: Running job: job_1429858372957_0002 >> 15/04/26 22:56:51 INFO mapreduce.Job: Job job_1429858372957_0002 running >> in uber mode : false >> 15/04/26 22:56:51 INFO mapreduce.Job: map 0% reduce 0% >> >> *[Failure 1 on IIS ftp server on the same Windows7 OS] :* >> [h...@hostname2.com ~]$ hadoop distcp >> ftp://Viewer:passw...@hostname1.com:21/ftp_file1.txt /tmp/ >> 15/04/27 00:02:45 INFO tools.DistCp: Input Options: >> DistCpOptions{atomicCommit=false, syncFolder=false, deleteMissing=false, >> ignoreFailures=false, maxMaps=20, sslConfigurationFile='null', >> copyStrategy='uniformsize', sourceFileListing=null, sourcePaths=[ >> ftp://Viewer:passw...@hostname1.com:21/ftp_file1.txt], targetPath=/tmp, >> targetPathExists=true, preserveRawXattrs=false} >> 15/04/27 00:02:47 INFO impl.TimelineClientImpl: Timeline service address: >> http://hostname2.com:8188/ws/v1/timeline/ >> 15/04/27 00:02:47 INFO client.RMProxy: Connecting to ResourceManager at >> hostname2.com/9.32.249.181:8050 >> 15/04/27 00:03:50 ERROR tools.DistCp: Invalid input: >> org.apache.hadoop.tools.CopyListing$InvalidInputException: >> ftp://Viewer:passw...@hostname1.com:21/ftp_file1.txt doesn't exist >> at >> org.apache.hadoop.tools.GlobbedCopyListing.doBuildListing(GlobbedCopyListing.java:84) >> at >> org.apache.hadoop.tools.CopyListing.buildListing(CopyListing.java:84) >> at >> org.apache.hadoop.tools.DistCp.createInputFileListing(DistCp.java:353) >> at org.apache.hadoop.tools.DistCp.execute(DistCp.java:160) >> at org.apache.hadoop.tools.DistCp.run(DistCp.java:121) >> at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:70) >> at org.apache.hadoop.tools.DistCp.main(DistCp.java:401) >> >> *[Failure 2 on IIS ftp server on the same Windows7 OS] :* >> [biad...@hostname2.com ~]$ hadoop distcp >> ftp://Viewer:passw0rd@9.126.146.71/ftp-win.txt /tmp/ >> 15/02/01 23:03:37 INFO tools.DistCp: Input Options: >> DistCpOptions{atomicCommit=false, syncFolder=false, deleteMissing=false, >> ignoreFailures=false, maxMaps=20, sslConfigurationFile='null', >> copyStrategy='uniformsize', sourceFileListing=null, sourcePaths=[ >> ftp://Viewer:passw0rd@9.126.146.71/ftp-win.txt], targetPath=/tmp, >> targetPathExists=true} >> 15/02/01 23:03:38 INFO client.RMProxy: Connecting to ResourceManager at >> hostname2.com/9.32.249.181:8032 >> 15/02/01 23:05:50 ERROR tools.DistCp: Exception encountered >> org.apache.commons.net.ftp.FTPConnectionClosedException: Connection >> closed without indication. >> at org.apache.commons.net.ftp.FTP.__getReply(FTP.java:313) >> at org.apache.commons.net.ftp.FTP.__getReply(FTP.java:290) >> at org.apache.commons.net.ftp.FTP.sendCommand(FTP.java:479) >> at org.apache.commons.net.ftp.FTP.sendCommand(FTP.java:552) >> at org.apache.commons.net.ftp.FTP.sendCommand(FTP.java:601) >> at org.apache.commons.net.ftp.FTP.quit(FTP.java:809) >> at org.apache.commons.net.ftp.FTPClient.logout(FTPClient.java:979) >> at >> org.apache.hadoop.fs.ftp.FTPFileSystem.disconnect(FTPFileSystem.java:151) >> at >> org.apache.hadoop.fs.ftp.FTPFileSystem.getFileStatus(FTPFileSystem.java:395) >> at org.apache.hadoop.fs.Globber.getFileStatus(Globber.java:57) >> at org.apache.hadoop.fs.Globber.glob(Globber.java:248) >> at >> org.apache.hadoop.fs.FileSystem.globStatus(FileSystem.java:1632) >> at >> org.apache.hadoop.tools.GlobbedCopyListing.doBuildListing(GlobbedCopyListing.java:77) >> at >> org.apache.hadoop.tools.CopyListing.buildListing(CopyListing.java:80) >> at >> org.apache.hadoop.tools.DistCp.createInputFileListing(DistCp.java:342) >> at org.apache.hadoop.tools.DistCp.execute(DistCp.java:154) >> at org.apache.hadoop.tools.DistCp.run(DistCp.java:121) >> at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:70) >> at org.apache.hadoop.tools.DistCp.main(DistCp.java:390) >> >> *[Failure 3 on IIS ftp server on the same Windows7 OS] :* >> [h...@hostname2.com ~]$ hadoop distcp >> ftp://Viewer:passw...@hostname1.com:21/ftp_file1.txt /tmp/ >> 15/04/27 00:08:18 INFO tools.DistCp: Input Options: >> DistCpOptions{atomicCommit=false, syncFolder=false, deleteMissing=false, >> ignoreFailures=false, maxMaps=20, sslConfigurationFile='null', >> copyStrategy='uniformsize', sourceFileListing=null, sourcePaths=[ >> ftp://Viewer:passw...@hostname1.com:21/ftp_file1.txt], targetPath=/tmp, >> targetPathExists=true, preserveRawXattrs=false} >> 15/04/27 00:08:19 INFO impl.TimelineClientImpl: Timeline service address: >> http://hostname2.com:8188/ws/v1/timeline/ >> 15/04/27 00:08:19 INFO client.RMProxy: Connecting to ResourceManager at >> hostname2.com/9.32.249.181:8050 >> 15/04/27 00:10:29 ERROR tools.DistCp: Exception encountered >> java.net.SocketException: Connection reset >> at java.net.SocketInputStream.read(SocketInputStream.java:196) >> at java.net.SocketInputStream.read(SocketInputStream.java:122) >> at sun.nio.cs.StreamDecoder.readBytes(StreamDecoder.java:283) >> at sun.nio.cs.StreamDecoder.implRead(StreamDecoder.java:325) >> at sun.nio.cs.StreamDecoder.read(StreamDecoder.java:177) >> at java.io.InputStreamReader.read(InputStreamReader.java:184) >> at java.io.BufferedReader.fill(BufferedReader.java:154) >> at java.io.BufferedReader.read(BufferedReader.java:175) >> at >> org.apache.commons.net.io.CRLFLineReader.readLine(CRLFLineReader.java:58) >> at org.apache.commons.net.ftp.FTP.__getReply(FTP.java:310) >> at org.apache.commons.net.ftp.FTP.__getReply(FTP.java:290) >> at org.apache.commons.net.ftp.FTP.sendCommand(FTP.java:479) >> at org.apache.commons.net.ftp.FTP.sendCommand(FTP.java:552) >> at org.apache.commons.net.ftp.FTP.sendCommand(FTP.java:601) >> at org.apache.commons.net.ftp.FTP.quit(FTP.java:809) >> at org.apache.commons.net.ftp.FTPClient.logout(FTPClient.java:979) >> at >> org.apache.hadoop.fs.ftp.FTPFileSystem.disconnect(FTPFileSystem.java:162) >> at >> org.apache.hadoop.fs.ftp.FTPFileSystem.getFileStatus(FTPFileSystem.java:410) >> at org.apache.hadoop.fs.Globber.getFileStatus(Globber.java:57) >> at org.apache.hadoop.fs.Globber.glob(Globber.java:252) >> at >> org.apache.hadoop.fs.FileSystem.globStatus(FileSystem.java:1625) >> at >> org.apache.hadoop.tools.GlobbedCopyListing.doBuildListing(GlobbedCopyListing.java:77) >> at >> org.apache.hadoop.tools.CopyListing.buildListing(CopyListing.java:84) >> at >> org.apache.hadoop.tools.DistCp.createInputFileListing(DistCp.java:353) >> at org.apache.hadoop.tools.DistCp.execute(DistCp.java:160) >> at org.apache.hadoop.tools.DistCp.run(DistCp.java:121) >> at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:70) >> at org.apache.hadoop.tools.DistCp.main(DistCp.java:401) >> >> Thanks! >> >> >> 2015-02-02 15:41 GMT+08:00 sam liu <samliuhad...@gmail.com>: >> >>> Hi Experts, >>> >>> I could run distcp against ftp server installed on Linux, but could NOT >>> run distcp against ftp server installed on Windows. Below are the steps. >>> >>> Is this a DistCp bug? Any comments? >>> >>> [Scenario 1] >>> I installed a BI cluster using trunk build on HadoopNode1, and then >>> could copy file from a ftp installed on Linux to hdfs using command: >>> hadoop distcp ftp://user1:user1@9.185.68.201/home/user1/ftp.txt >>> hdfs://HadoopNode1:9000/tmp/ >>> >>> [Scenario 2] >>> On the same hadoop node, I can copy file from a remote ftp server >>> installed on Windows7 using command: >>> wget ftp://Viewer:password1@9.126.148.79/ftp-win.txt. >>> >>> But I failed to copy file from a ftp installed on Windows7 to hdfs using >>> command: >>> [user1@HadoopNode1 ~]$ hadoop distcp >>> ftp://Viewer:password1@9.126.148.79/ftp-win.txt /tmp/ >>> 15/02/01 23:03:37 INFO tools.DistCp: Input Options: >>> DistCpOptions{atomicCommit=false, syncFolder=false, deleteMissing=false, >>> ignoreFailures=false, maxMaps=20, sslConfigurationFile='null', >>> copyStrategy='uniformsize', sourceFileListing=null, sourcePaths=[ >>> ftp://Viewer:password1@9.126.148.79/ftp-win.txt], targetPath=/tmp, >>> targetPathExists=true} >>> 15/02/01 23:03:38 INFO client.RMProxy: Connecting to ResourceManager at >>> HadoopNode1/9.30.239.166:8032 >>> 15/02/01 23:05:50 ERROR tools.DistCp: Exception encountered >>> org.apache.commons.net.ftp.FTPConnectionClosedException: Connection >>> closed without indication. >>> at org.apache.commons.net.ftp.FTP.__getReply(FTP.java:313) >>> at org.apache.commons.net.ftp.FTP.__getReply(FTP.java:290) >>> at org.apache.commons.net.ftp.FTP.sendCommand(FTP.java:479) >>> at org.apache.commons.net.ftp.FTP.sendCommand(FTP.java:552) >>> at org.apache.commons.net.ftp.FTP.sendCommand(FTP.java:601) >>> at org.apache.commons.net.ftp.FTP.quit(FTP.java:809) >>> at >>> org.apache.commons.net.ftp.FTPClient.logout(FTPClient.java:979) >>> at >>> org.apache.hadoop.fs.ftp.FTPFileSystem.disconnect(FTPFileSystem.java:151) >>> at >>> org.apache.hadoop.fs.ftp.FTPFileSystem.getFileStatus(FTPFileSystem.java:395) >>> at org.apache.hadoop.fs.Globber.getFileStatus(Globber.java:57) >>> at org.apache.hadoop.fs.Globber.glob(Globber.java:248) >>> at >>> org.apache.hadoop.fs.FileSystem.globStatus(FileSystem.java:1632) >>> at >>> org.apache.hadoop.tools.GlobbedCopyListing.doBuildListing(GlobbedCopyListing.java:77) >>> at >>> org.apache.hadoop.tools.CopyListing.buildListing(CopyListing.java:80) >>> at >>> org.apache.hadoop.tools.DistCp.createInputFileListing(DistCp.java:342) >>> at org.apache.hadoop.tools.DistCp.execute(DistCp.java:154) >>> at org.apache.hadoop.tools.DistCp.run(DistCp.java:121) >>> at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:70) >>> at org.apache.hadoop.tools.DistCp.main(DistCp.java:390) >>> >>> Thanks! >>> >> >> >