[
https://issues.apache.org/jira/browse/HDFS-1109?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13443875#comment-13443875
]
TaoZhang commented on HDFS-1109:
--------------------------------
I use distcp copying data from hadoop1.0.3(by hftp) to hadoop 2.0.1
(hdfs).
When the file path(or file name) contain Chinese character, an
exception will throw.
I also have tried distcp between 1.0.3s and 2.0.1s.
Both are failed.
Path contain Chinese character.
----------------------
1.0.3 hftp to 1.0.3 hdfs, exception inform is below.
12/08/29 00:24:23 INFO tools.DistCp: sourcePathsCount=2
12/08/29 00:24:23 INFO tools.DistCp: filesToCopyCount=1
12/08/29 00:24:23 INFO tools.DistCp: bytesToCopyCount=1.2k
12/08/29 00:24:24 INFO mapred.JobClient: Running job: job_201208101345_2203
12/08/29 00:24:25 INFO mapred.JobClient: map 0% reduce 0%
12/08/29 00:24:46 INFO mapred.JobClient: Task Id :
attempt_201208101345_2203_m_000000_0, Status : FAILED
java.io.IOException: Copied: 0 Skipped: 0 Failed: 1
at org.apache.hadoop.tools.DistCp$CopyFilesMapper.close(DistCp.java:582)
at org.apache.hadoop.mapred.MapRunner.run(MapRunner.java:57)
at org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:436)
at org.apache.hadoop.mapred.MapTask.run(MapTask.java:372)
at org.apache.hadoop.mapred.Child$4.run(Child.java:255)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:396)
at
org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1121)
at org.apache.hadoop.mapred.Child.main(Child.java:249)
12/08/29 00:25:04 INFO mapred.JobClient: Task Id :
attempt_201208101345_2203_m_000000_1, Status : FAILED
java.io.IOException: Copied: 0 Skipped: 0 Failed: 1
at org.apache.hadoop.tools.DistCp$CopyFilesMapper.close(DistCp.java:582)
at org.apache.hadoop.mapred.MapRunner.run(MapRunner.java:57)
at org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:436)
at org.apache.hadoop.mapred.MapTask.run(MapTask.java:372)
at org.apache.hadoop.mapred.Child$4.run(Child.java:255)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:396)
at
org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1121)
at org.apache.hadoop.mapred.Child.main(Child.java:249)
12/08/29 00:25:19 INFO mapred.JobClient: Task Id :
attempt_201208101345_2203_m_000000_2, Status : FAILED
java.io.IOException: Copied: 0 Skipped: 0 Failed: 1
at org.apache.hadoop.tools.DistCp$CopyFilesMapper.close(DistCp.java:582)
at org.apache.hadoop.mapred.MapRunner.run(MapRunner.java:57)
at org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:436)
at org.apache.hadoop.mapred.MapTask.run(MapTask.java:372)
at org.apache.hadoop.mapred.Child$4.run(Child.java:255)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:396)
at
org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1121)
at org.apache.hadoop.mapred.Child.main(Child.java:249)
12/08/29 00:25:40 INFO mapred.JobClient: Job complete: job_201208101345_2203
12/08/29 00:25:40 INFO mapred.JobClient: Counters: 6
12/08/29 00:25:40 INFO mapred.JobClient: Job Counters
12/08/29 00:25:40 INFO mapred.JobClient: SLOTS_MILLIS_MAPS=66844
12/08/29 00:25:40 INFO mapred.JobClient: Total time spent by all reduces
waiting after reserving slots (ms)=0
12/08/29 00:25:40 INFO mapred.JobClient: Total time spent by all maps
waiting after reserving slots (ms)=0
12/08/29 00:25:40 INFO mapred.JobClient: Launched map tasks=4
12/08/29 00:25:40 INFO mapred.JobClient: SLOTS_MILLIS_REDUCES=0
12/08/29 00:25:40 INFO mapred.JobClient: Failed map tasks=1
12/08/29 00:25:40 INFO mapred.JobClient: Job Failed: # of failed Map Tasks
exceeded allowed limit. FailedCount: 1. LastFailedTask:
task_201208101345_2203_m_000000
With failures, global counters are inaccurate; consider running with -i
Copy failed: java.io.IOException: Job failed!
at org.apache.hadoop.mapred.JobClient.runJob(JobClient.java:1265)
at org.apache.hadoop.tools.DistCp.copy(DistCp.java:667)
at org.apache.hadoop.tools.DistCp.run(DistCp.java:881)
at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:65)
at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:79)
at org.apache.hadoop.tools.DistCp.main(DistCp.java:908)
-----------------------------------------
2.0.1 hftp to 2.0.1 hdfs, exception inform is below.
12/08/29 00:20:06 INFO tools.DistCp: DistCp job-id: job_1345831938927_0043
12/08/29 00:20:06 INFO mapreduce.Job: Running job: job_1345831938927_0043
12/08/29 00:20:14 INFO mapreduce.Job: Job job_1345831938927_0043 running in
uber mode : false
12/08/29 00:20:14 INFO mapreduce.Job: map 0% reduce 0%
12/08/29 00:20:23 INFO mapreduce.Job: Task Id :
attempt_1345831938927_0043_m_000000_0, Status : FAILED
Error: java.io.IOException: File copy failed:
hftp://baby20:50070/tmp/??.log/add.csv --> hdfs://baby20:54310/tmp4/add.csv
at
org.apache.hadoop.tools.mapred.CopyMapper.copyFileWithRetry(CopyMapper.java:262)
at org.apache.hadoop.tools.mapred.CopyMapper.map(CopyMapper.java:229)
at org.apache.hadoop.tools.mapred.CopyMapper.map(CopyMapper.java:45)
at org.apache.hadoop.mapreduce.Mapper.run(Mapper.java:144)
at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:725)
at org.apache.hadoop.mapred.MapTask.run(MapTask.java:332)
at org.apache.hadoop.mapred.YarnChild$2.run(YarnChild.java:152)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:396)
at
org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1232)
at org.apache.hadoop.mapred.YarnChild.main(YarnChild.java:147)
Caused by: java.io.IOException: Couldn't run retriable-command: Copying
hftp://baby20:50070/tmp/中文.log/add.csv to hdfs://baby20:54310/tmp4/add.csv
at
org.apache.hadoop.tools.util.RetriableCommand.execute(RetriableCommand.java:101)
at
org.apache.hadoop.tools.mapred.CopyMapper.copyFileWithRetry(CopyMapper.java:258)
... 10 more
Caused by:
org.apache.hadoop.tools.mapred.RetriableFileCopyCommand$CopyReadException:
java.io.IOException: HTTP_OK expected, received 400
at
org.apache.hadoop.tools.mapred.RetriableFileCopyCommand.readBytes(RetriableFileCopyCommand.java:201)
at
org.apache.hadoop.tools.mapred.RetriableFileCopyCommand.copyBytes(RetriableFileCopyCommand.java:167)
at
org.apache.hadoop.tools.mapred.RetriableFileCopyCommand.copyToTmpFile(RetriableFileCopyCommand.java:112)
at
org.apache.hadoop.tools.mapred.RetriableFileCopyCommand.doCopy(RetriableFileCopyCommand.java:90)
at
org.apache.hadoop.tools.mapred.RetriableFileCopyCommand.doExecute(RetriableFileCopyCommand.java:71)
at
org.apache.hadoop.tools.util.RetriableCommand.execute(RetriableCommand.java:87)
... 11 more
Caused by: java.io.IOException: HTTP_OK expected, received 400
at
org.apache.hadoop.hdfs.HftpFileSystem$RangeHeaderInputStream.checkResponseCode(HftpFileSystem.java:381)
at
org.apache.hadoop.hdfs.ByteRangeInputStream.openInputStream(ByteRangeInputStream.java:121)
at
org.apache.hadoop.hdfs.ByteRangeInputStream.getInputStream(ByteRangeInputStream.java:103)
at
org.apache.hadoop.hdfs.ByteRangeInputStream.read(ByteRangeInputStream.java:158)
at java.io.DataInputStream.read(DataInputStream.java:132)
at java.io.BufferedInputStream.read1(BufferedInputStream.java:256)
at java.io.BufferedInputStream.read(BufferedInputStream.java:317)
at java.io.FilterInputStream.read(FilterInputStream.java:90)
at
org.apache.hadoop.tools.util.ThrottledInputStream.read(ThrottledInputStream.java:70)
at
org.apache.hadoop.tools.mapred.RetriableFileCopyCommand.readBytes(RetriableFileCopyCommand.java:198)
... 16 more
12/08/29 00:20:23 WARN mapreduce.Job: Error reading task output Server returned
HTTP response code: 400 for URL:
http://baby19:8080/tasklog?plaintext=true&attemptid=attempt_1345831938927_0043_m_000000_0&filter=stdout
12/08/29 00:20:23 WARN mapreduce.Job: Error reading task output Server returned
HTTP response code: 400 for URL:
http://baby19:8080/tasklog?plaintext=true&attemptid=attempt_1345831938927_0043_m_000000_0&filter=stderr
> HFTP and URL Encoding
> ---------------------
>
> Key: HDFS-1109
> URL: https://issues.apache.org/jira/browse/HDFS-1109
> Project: Hadoop HDFS
> Issue Type: Bug
> Components: contrib/hdfsproxy, data-node
> Affects Versions: 0.20.1, 0.20.2, 0.20.3, 0.21.0, 0.22.0
> Reporter: Dmytro Molkov
> Assignee: Dmytro Molkov
> Fix For: 0.22.0
>
> Attachments: HDFS-1109.2.patch,
> HDFS-1109.2_y0.20.1xx_incremental.patch, HDFS-1109.2_y0.20.1xx.patch,
> HDFS-1109.patch
>
>
> We just saw this error happen in our cluster. If there is a file that has a
> "+" sign in the name it is not readable through HFTP protocol.
> The problem is when we are reading a file with HFTP we are passing a name of
> the file as a parameter in request and + gets undecoded into space on the
> server side. So the datanode receiving the streamFile request tries to access
> a file with space instead of + in the name and doesn't find that file.
> The proposed solution is to pass the filename as a part of URL as with all
> the other HFTP commands, since this is the only place where it is not being
> treated this way. Are there any objections to this?
--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira