Raghu et al: I reproduced all my experiments, only this time on an EC2 node, and they all ran successfully without incident. So I am suspecting a machine or hardware configuration issue. I am going to try a more controlled series of experiments this weekend on a machine that I can let Radhu have access to if I can reproduce the issues. More later... Thanks for all the assistance...much appreciated.
Raghu Angadi <[EMAIL PROTECTED]> wrote: C G, Any specifics on how you reproduce any of these issues will be helpful. I was able to copy a 5GB file without errors. copyFromLocal just copies raw file content. Not sure of what '5,000,000 rows' means. Raghu. C G wrote: > Further experimentation, again single node configuration on a 4way 8G machine > w/0.14.0, trying to copyFromLocal 669M of data in 5,000,000 rows I see this > in the namenode log: > > 2007-08-24 00:50:45,902 WARN org.apache.hadoop.dfs.StateChange: DIR* > NameSystem.completeFile: failed to complete /input/t.dat because > dir.getFileBlocks() is non-null and pendingFile is null > 2007-08-24 00:50:48,000 INFO org.apache.hadoop.ipc.Server: IPC Server handler > 9 on 54310, call complete(/input/t.dat, DFSClient_-2013541261) from > XXX.XXX.XXX.XX:36470: error: java.io.IOException: Could not complete write to > file /input/t.dat by DFSClient_-2013541261 > java.io.IOException: Could not complete write to file /input/t.dat by > DFSClient_-2013541261 > at org.apache.hadoop.dfs.NameNode.complete(NameNode.java:359) > at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) > at > sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39) > at > sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25) > at java.lang.reflect.Method.invoke(Method.java:585) > at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:340) > at org.apache.hadoop.ipc.Server$Handler.run(Server.java:566) > > Any thoughts or help appreciated...I'm planning to build out a large grid > running terabytes of data...assuming I can get it Hadoop to handle more than > 500M :-(. > > Thanks! > > > Raghu Angadi wrote: > Regd the second problem : > > It is surprising that this fails repeatedly around the same place. 0.14 > does check the checksum at the datanode (0.13 did not do this check). I > will try to reproduce this. > > Raghu. > > C G wrote: >> Hi All: >> Second issue is a failure on copyFromLocal with lost connections. I'm trying >> to copy a 5.8G, 88,784,045 million row file to HDFS. It makes progress for a >> while, but at approx 2.1 gigs copied, it dies with a repeated series of >> errors. There is 470G free on the file system. The error is repeated several >> times and is: >> $ bin/hadoop dfs -copyFromLocal sample.dat /input/sample.dat >> 07/08/23 15:58:10 WARN fs.DFSClient: Error while writing. >> java.net.SocketException: Connection reset >> at java.net.SocketOutputStream.socketWrite(SocketOutputStream.java:96) >> at java.net.SocketOutputStream.write(SocketOutputStream.java:136) >> at java.io.BufferedOutputStream.flushBuffer(BufferedOutputStream.java:65) >> at java.io.BufferedOutputStream.write(BufferedOutputStream.java:109) >> at java.io.DataOutputStream.write(DataOutputStream.java:90) >> at >> org.apache.hadoop.dfs.DFSClient$DFSOutputStream.endBlock(DFSClient.java:1656) >> at >> org.apache.hadoop.dfs.DFSClient$DFSOutputStream.writeChunk(DFSClient.java:1610) >> at >> org.apache.hadoop.fs.FSOutputSummer.writeChecksumChunk(FSOutputSummer.java:140) >> at org.apache.hadoop.fs.FSOutputSummer.write1(FSOutputSummer.java:100) >> at org.apache.hadoop.fs.FSOutputSummer.write(FSOutputSummer.java:86) >> at >> org.apache.hadoop.fs.FSDataOutputStream$PositionCache.write(FSDataOutputStream.java:39) >> at java.io.DataOutputStream.write(DataOutputStream.java:90) >> at org.apache.hadoop.fs.FileUtil.copyContent(FileUtil.java:258) >> at org.apache.hadoop.fs.FileUtil.copyContent(FileUtil.java:248) >> at org.apache.hadoop.fs.FileUtil.copy(FileUtil.java:133) >> at org.apache.hadoop.fs.FileSystem.copyFromLocalFile(FileSystem.java:776) >> at org.apache.hadoop.fs.FileSystem.copyFromLocalFile(FileSystem.java:757) >> at org.apache.hadoop.fs.FsShell.copyFromLocal(FsShell.java:116) >> at org.apache.hadoop.fs.FsShell.run(FsShell.java:1229) >> at org.apache.hadoop.util.ToolBase.doMain(ToolBase.java:187) >> at org.apache.hadoop.fs.FsShell.main(FsShell.java:1342) >> >> The following error also appears several times in the datanode logs: >> 2007-08-23 15:58:10,072 ERROR org.apache.hadoop.dfs.DataNode: DataXceiver: >> java.io.IOException: Unexpected checksum mismatch while writing >> blk_1461965301876815406 from /xxx.xxx.xxx.xx:50960 >> at org.apache.hadoop.dfs.DataNode$DataXceiver.writeBlock(DataNode.java:902) >> at org.apache.hadoop.dfs.DataNode$DataXceiver.run(DataNode.java:727) >> at java.lang.Thread.run(Thread.java:595) >> >> >> Any help on these issues much appreciated. >> >> >> --------------------------------- >> Luggage? GPS? Comic books? >> Check out fitting gifts for grads at Yahoo! Search. >> >> --------------------------------- >> Pinpoint customers who are looking for what you sell. > > > > > --------------------------------- > Pinpoint customers who are looking for what you sell. --------------------------------- Yahoo! oneSearch: Finally, mobile search that gives answers, not web links.