[
https://issues.apache.org/jira/browse/HDFS-5996?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
wangmeng updated HDFS-5996:
---------------------------
Description:
I am a student from China ,my research is Hive data storage on hadoop
.There is a hdfs-write bug when I used sql : insert overwrite table
wangmeng select * from testTable (this sql is translated into N map(
no Reduce) jobs,each map .corresponding to a HDFS file output On disk. )
No matter what value N is , there will always exists some
DfsdataoutputStream buffer can not write to disk at last ,such as N=160
files ,then there my be about 5 write-faliure files .,the write-failured
hdfs--file size on disk is always 0 bytes rather than a value which is
between 0 and zhe correct size. .There does not have any exceptions to
throw . and the HDFS WRITTEN statistical data is absolutely correct .
When I debug , I find those write-failed DFS-buffer own
absolutely correct values on its buffer ,but the buffer can not write to
disk at last although I use Dfsdataoutputstream.flush() , Dfsdataoutputstream
close() .
.I can not find the reason those dfs-buffer can not write success.
Now I choose a method to avoide this problem --use a temporary file : for
example , if the DFS-buffer will write to its destination FINAL, now I will
let this DFS-buffer write to a temporary file TEM first ,and then I move
the TEM data to the destination just by change the hdfs-- file path.
This method can avoid the DFS-buffer write -failure .Now I want to fix
this problem radically , so How can I patch my codes about this problem
and is there anything I can do ? Many Thanks.
was:
I am a student from China ,my research is Hive data storage on hadoop
.There is a hdfs-write bug when I used sql : insert overwrite table
wangmeng select * from testTable (this sql is translated into N map(
no Reduce) jobs,each map .corresponding to a HDFS file output On disk. )
No matter what value N is , there will always exists some
DfsdataoutputStream buffer can not write to disk at last ,such as N=160
files ,then there my be about 5 write-faliure files .,the write-failured
hdfs--file size on disk is always 0 bytes rather than a value which is
between 0 and zhe correct size. .There does not have any exceptions to
throw . and the HDFS WRITTEN statistical data is absolutely correct .
When I debug , I find those write-failed DFS-buffer own
absolutely correct values on its buffer ,but the buffer can not write to
disk at last although I use Dfsdataoutputstream.flush() , Dfsdataoutputstream
close() . .I can not find the reason those dfs-buffer can not write success.
Now I choose a method to avoide this problem --use a temporary file : for
example , if the DFS-buffer will write to its destination FINAL, now I will
let this DFS-buffer write to a temporary file TEM first ,and then I move
the TEM data to the destination just by change the hdfs-- file path.
This method can avoid the DFS-buffer write -failure .Now I want to fix
this problem radically , so How can I patch my codes about this problem
and is there anything I can do ? Many Thanks.
> hadoop 1.12. hdfs write bug
> ------------------------------
>
> Key: HDFS-5996
> URL: https://issues.apache.org/jira/browse/HDFS-5996
> Project: Hadoop HDFS
> Issue Type: Bug
> Components: fuse-dfs
> Affects Versions: 1.1.2
> Environment: one master and three slave ,all of them are normal
> Reporter: wangmeng
> Fix For: 1.1.2
>
> Original Estimate: 504h
> Remaining Estimate: 504h
>
> I am a student from China ,my research is Hive data storage on
> hadoop .There is a hdfs-write bug when I used sql : insert overwrite table
> wangmeng select * from testTable (this sql is translated into N
> map( no Reduce) jobs,each map .corresponding to a HDFS file output On
> disk. ) No matter what value N is , there will always exists some
> DfsdataoutputStream buffer can not write to disk at last ,such as N=160
> files ,then there my be about 5 write-faliure files .,the
> write-failured hdfs--file size on disk is always 0 bytes rather than a
> value which is between 0 and zhe correct size. .There does not have
> any exceptions to throw . and the HDFS WRITTEN statistical data is
> absolutely correct .
> When I debug , I find those write-failed DFS-buffer own
> absolutely correct values on its buffer ,but the buffer can not write
> to disk at last although I use Dfsdataoutputstream.flush() ,
> Dfsdataoutputstream close() .
> .I can not find the reason those dfs-buffer can not write success.
> Now I choose a method to avoide this problem --use a temporary file : for
> example , if the DFS-buffer will write to its destination FINAL, now I
> will let this DFS-buffer write to a temporary file TEM first ,and then I
> move the TEM data to the destination just by change the hdfs-- file
> path. This method can avoid the DFS-buffer write -failure .Now I want
> to fix this problem radically , so How can I patch my codes about this
> problem and is there anything I can do ? Many Thanks.
--
This message was sent by Atlassian JIRA
(v6.1.5#6160)