[ 
https://issues.apache.org/jira/browse/HDFS-5996?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

wangmeng updated HDFS-5996:
---------------------------

    Description: I am  a student from China ,my research  is Hive  data storage 
on hadoop .There is a  hdfs-write bug  when I used  sql : insert overwrite 
table  wangmeng  select  *    from testTable (this  sql  is  translated   into 
N map( no Reduce)  jobs,each map .corresponding to  a  HDFS  file  output On 
disk. )  No  matter  what value N is , there will  always  exists  some   
DfsdataoutputStream buffer  can  not  write to disk at  last ,such as N=160 
files ,then  there my  be  about 5  write-faliure  files .,the  write-failured  
hdfs--file size on disk  is always 0 bytes  rather than a value which  is 
between 0  and zhe correct  size. .There does   not  have  any exceptions to 
throw . and the  HDFS WRITTEN  statistical data   is  absolutely correct .  
When  I  debug , I find  those  write-failed DFS-buffer   own  absolutely  
correct values on   its  buffer ,but the  buffer  can  not write to disk  at 
last although I use Dfsdataoutputstream.flush()  , Dfsdataoutputstream close() 
. .I can not find the reason those  dfs-buffer  can not  write success.  Now I 
choose  a method to avoide this problem --use a temporary  file : for  example 
, if  the  DFS-buffer  will write to its destination FINAL, now I will let this 
DFS-buffer  write to a temporary file TEM  first ,and  then  I   move  the  TEM 
 data  to the destination just  by change the   hdfs-- file path.  This method 
can avoid  the DFS-buffer  write -failure .Now   I   want  to   fix this 
problem  radically ,  so How can I patch  my codes about  this  problem  and  
is  there  anything I  can do ? Many  Thanks.  (was: I am  a student from China 
,my research  is Hive  data storage on hadoop .There is a  hdfs-write bug  when 
I used  sql : insert overwrite table  wangmeng  select  *    from testTable 
(this  sql  is  translated   into N map( no Reduce)  jobs,each map 
.corresponding to  a  HDFS  file  output On disk. )  No  matter  what value N 
is , there will  always  exists  some   DfsdataoutputStream buffer  can  not  
write to disk at  last ,such as N=160 files ,then  there my  be  about 5  
write-faliure  files .,the  write-failured  hdfs--file size on disk  is always 
0 bytes  rather than a value which  is between 0  and zhe correct  size. .There 
does   not  have  any exceptions to throw . and the  HDFS WRITTEN  statisticcal 
data   is  absolutely correct .  When I  debug , I find  those  write-failed 
DFS-buffer   own  absolutely  correct values on   its  buffer ,but the  buffer  
can  not write to disk  at last although I use Dfsdataoutputstream.flush  , 
Dfsdataoutputstream close . .I can not find the reason those  dfs-buffer  can 
not  write success.  Now I choose  a method to avoide this problem --use a 
temporary  file : for  example , if  the  DFS-buffer  will write to its 
destination FINAL, now I will let this DFS-buffer  write to a temporary file 
TEM  first ,and  then  I   move  the  TEM  data  to the destination just  by 
change the   hdfs-- file path.  This method can avoid  the DFS-buffer  write 
-failure .Now   I   want  to   fix this problem  radically ,  so How can I 
patch  my codes about  this  problem  and  is  there  anything I  can do ? Many 
 Thanks.)

> hadoop 1.12.  hdfs  write bug 
> ------------------------------
>
>                 Key: HDFS-5996
>                 URL: https://issues.apache.org/jira/browse/HDFS-5996
>             Project: Hadoop HDFS
>          Issue Type: Bug
>          Components: fuse-dfs
>    Affects Versions: 1.1.2
>         Environment: one master and  three slave ,all  of  them are normal
>            Reporter: wangmeng
>             Fix For: 1.1.2
>
>   Original Estimate: 504h
>  Remaining Estimate: 504h
>
> I am  a student from China ,my research  is Hive  data storage on hadoop 
> .There is a  hdfs-write bug  when I used  sql : insert overwrite table  
> wangmeng  select  *    from testTable (this  sql  is  translated   into N 
> map( no Reduce)  jobs,each map .corresponding to  a  HDFS  file  output On 
> disk. )  No  matter  what value N is , there will  always  exists  some   
> DfsdataoutputStream buffer  can  not  write to disk at  last ,such as N=160 
> files ,then  there my  be  about 5  write-faliure  files .,the  
> write-failured  hdfs--file size on disk  is always 0 bytes  rather than a 
> value which  is between 0  and zhe correct  size. .There does   not  have  
> any exceptions to throw . and the  HDFS WRITTEN  statistical data   is  
> absolutely correct .  When  I  debug , I find  those  write-failed DFS-buffer 
>   own  absolutely  correct values on   its  buffer ,but the  buffer  can  not 
> write to disk  at last although I use Dfsdataoutputstream.flush()  , 
> Dfsdataoutputstream close() . .I can not find the reason those  dfs-buffer  
> can not  write success.  Now I choose  a method to avoide this problem --use 
> a temporary  file : for  example , if  the  DFS-buffer  will write to its 
> destination FINAL, now I will let this DFS-buffer  write to a temporary file 
> TEM  first ,and  then  I   move  the  TEM  data  to the destination just  by 
> change the   hdfs-- file path.  This method can avoid  the DFS-buffer  write 
> -failure .Now   I   want  to   fix this problem  radically ,  so How can I 
> patch  my codes about  this  problem  and  is  there  anything I  can do ? 
> Many  Thanks.



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)

Reply via email to