Re: Issues with 0.14.0...

C G Sat, 25 Aug 2007 08:05:41 -0700

Raghu et al:
   
  I reproduced all my experiments, only this time on an EC2 node, and they all 
ran successfully without incident.  So I am suspecting a machine or hardware 
configuration issue.  
   
  I am going to try a more controlled series of experiments this weekend on a 
machine that I can let Radhu have access to if I can reproduce the issues.  
More later...
   
  Thanks for all the assistance...much appreciated.


Raghu Angadi <[EMAIL PROTECTED]> wrote:
  C G,

Any specifics on how you reproduce any of these issues will be helpful. 
I was able to copy a 5GB file without errors. copyFromLocal just copies 
raw file content. Not sure of what '5,000,000 rows' means.

Raghu.

C G wrote:
> Further experimentation, again single node configuration on a 4way 8G machine 
> w/0.14.0, trying to copyFromLocal 669M of data in 5,000,000 rows I see this 
> in the namenode log:
> 
> 2007-08-24 00:50:45,902 WARN org.apache.hadoop.dfs.StateChange: DIR* 
> NameSystem.completeFile: failed to complete /input/t.dat because 
> dir.getFileBlocks() is non-null and pendingFile is null
> 2007-08-24 00:50:48,000 INFO org.apache.hadoop.ipc.Server: IPC Server handler 
> 9 on 54310, call complete(/input/t.dat, DFSClient_-2013541261) from 
> XXX.XXX.XXX.XX:36470: error: java.io.IOException: Could not complete write to 
> file /input/t.dat by DFSClient_-2013541261
> java.io.IOException: Could not complete write to file /input/t.dat by 
> DFSClient_-2013541261
> at org.apache.hadoop.dfs.NameNode.complete(NameNode.java:359)
> at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
> at 
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
> at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
> at java.lang.reflect.Method.invoke(Method.java:585)
> at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:340)
> at org.apache.hadoop.ipc.Server$Handler.run(Server.java:566)
> 
> Any thoughts or help appreciated...I'm planning to build out a large grid 
> running terabytes of data...assuming I can get it Hadoop to handle more than 
> 500M :-(.
> 
> Thanks!
> 
> 
> Raghu Angadi wrote:
> Regd the second problem :
> 
> It is surprising that this fails repeatedly around the same place. 0.14 
> does check the checksum at the datanode (0.13 did not do this check). I 
> will try to reproduce this.
> 
> Raghu.
> 
> C G wrote:
>> Hi All:
>> Second issue is a failure on copyFromLocal with lost connections. I'm trying 
>> to copy a 5.8G, 88,784,045 million row file to HDFS. It makes progress for a 
>> while, but at approx 2.1 gigs copied, it dies with a repeated series of 
>> errors. There is 470G free on the file system. The error is repeated several 
>> times and is:
>> $ bin/hadoop dfs -copyFromLocal sample.dat /input/sample.dat
>> 07/08/23 15:58:10 WARN fs.DFSClient: Error while writing.
>> java.net.SocketException: Connection reset
>> at java.net.SocketOutputStream.socketWrite(SocketOutputStream.java:96)
>> at java.net.SocketOutputStream.write(SocketOutputStream.java:136)
>> at java.io.BufferedOutputStream.flushBuffer(BufferedOutputStream.java:65)
>> at java.io.BufferedOutputStream.write(BufferedOutputStream.java:109)
>> at java.io.DataOutputStream.write(DataOutputStream.java:90)
>> at 
>> org.apache.hadoop.dfs.DFSClient$DFSOutputStream.endBlock(DFSClient.java:1656)
>> at 
>> org.apache.hadoop.dfs.DFSClient$DFSOutputStream.writeChunk(DFSClient.java:1610)
>> at 
>> org.apache.hadoop.fs.FSOutputSummer.writeChecksumChunk(FSOutputSummer.java:140)
>> at org.apache.hadoop.fs.FSOutputSummer.write1(FSOutputSummer.java:100)
>> at org.apache.hadoop.fs.FSOutputSummer.write(FSOutputSummer.java:86)
>> at 
>> org.apache.hadoop.fs.FSDataOutputStream$PositionCache.write(FSDataOutputStream.java:39)
>> at java.io.DataOutputStream.write(DataOutputStream.java:90)
>> at org.apache.hadoop.fs.FileUtil.copyContent(FileUtil.java:258)
>> at org.apache.hadoop.fs.FileUtil.copyContent(FileUtil.java:248)
>> at org.apache.hadoop.fs.FileUtil.copy(FileUtil.java:133)
>> at org.apache.hadoop.fs.FileSystem.copyFromLocalFile(FileSystem.java:776)
>> at org.apache.hadoop.fs.FileSystem.copyFromLocalFile(FileSystem.java:757)
>> at org.apache.hadoop.fs.FsShell.copyFromLocal(FsShell.java:116)
>> at org.apache.hadoop.fs.FsShell.run(FsShell.java:1229)
>> at org.apache.hadoop.util.ToolBase.doMain(ToolBase.java:187)
>> at org.apache.hadoop.fs.FsShell.main(FsShell.java:1342)
>>
>> The following error also appears several times in the datanode logs:
>> 2007-08-23 15:58:10,072 ERROR org.apache.hadoop.dfs.DataNode: DataXceiver: 
>> java.io.IOException: Unexpected checksum mismatch while writing 
>> blk_1461965301876815406 from /xxx.xxx.xxx.xx:50960
>> at org.apache.hadoop.dfs.DataNode$DataXceiver.writeBlock(DataNode.java:902)
>> at org.apache.hadoop.dfs.DataNode$DataXceiver.run(DataNode.java:727)
>> at java.lang.Thread.run(Thread.java:595)
>>
>>
>> Any help on these issues much appreciated.
>>
>>
>> ---------------------------------
>> Luggage? GPS? Comic books? 
>> Check out fitting gifts for grads at Yahoo! Search.
>>
>> ---------------------------------
>> Pinpoint customers who are looking for what you sell. 
> 
> 
> 
> 
> ---------------------------------
> Pinpoint customers who are looking for what you sell. 



       
---------------------------------
Yahoo! oneSearch: Finally,  mobile search that gives answers, not web links.

Re: Issues with 0.14.0...

Reply via email to