Hi Raghu, I understand that.
I have also read that there is something in the works which will address some of this (Reader able to get data before Writer is completely done: HADOOP-1700). In my test the Writer and Reader are different threads (they could be even different processes). So how does the Reader know that the Writer is done writing the data (my requirement is that the Reader grab the data asap)? 1.Previously I was relying on the Reader NOT gettting the Exception (07/11/17 11:07:13 INFO fs.DFSClient: Could not obtain block blk_3484370064020998905 from any node: java.io.IOException: No live nodes contain current block) as a starting point for the Reader. 2.Now I have added the following check on the Reader side: DistributedFileSystem fileSystem = new DistributedFileSystem(); fileSystem.initialize(uri, conf); Path path = new Path(sKey); while (!fileSystem.exists(path)) { try { Thread.sleep(30); } catch (InterruptedException e) { // TODO Auto-generated catch block e.printStackTrace(); } } But still get this exception from time to time: 07/11/17 11:07:10 INFO fs.DFSClient: Could not obtain block blk_8590062477849775138 from any node: java.io.IOException: No live nodes contain current block 07/11/17 11:07:10 WARN fs.DFSClient: DFS Read: java.io.IOException: Blocklist for /hadoopdata0.txt has changed! at org.apache.hadoop.dfs.DFSClient$DFSInputStream.openInfo(DFSClient.java:871) at org.apache.hadoop.dfs.DFSClient$DFSInputStream.chooseDataNode(DFSClient.java:1161) at org.apache.hadoop.dfs.DFSClient$DFSInputStream.blockSeekTo(DFSClient.java:1004) at org.apache.hadoop.dfs.DFSClient$DFSInputStream.read(DFSClient.java:1107) at java.io.DataInputStream.read(DataInputStream.java:80) at HadoopDSMStore$ReaderThread.run(HadoopDSMStore.java:206) java.io.IOException: Blocklist for /hadoopdata0.txt has changed! at org.apache.hadoop.dfs.DFSClient$DFSInputStream.openInfo(DFSClient.java:871) at org.apache.hadoop.dfs.DFSClient$DFSInputStream.chooseDataNode(DFSClient.java:1161) at org.apache.hadoop.dfs.DFSClient$DFSInputStream.blockSeekTo(DFSClient.java:1004) at org.apache.hadoop.dfs.DFSClient$DFSInputStream.read(DFSClient.java:1107) at java.io.DataInputStream.read(DataInputStream.java:80) at HadoopDSMStore$ReaderThread.run(HadoopDSMStore.java:206) 07/11/17 11:07:10 INFO fs.DFSClient: Could not obtain block blk_3484370064020998905 from any node: java.io.IOException: No live nodes contain current block I could build an explicit hand-off from the Writer to Reader but that would be tricky for inter processes. Any ideas. Thanx, Taj Raghu Angadi wrote: > > Taj, > > I don't know what you are trying to do but simultaneous write and read > won't work on any filesystem (unless reader is more complicated that > what you had). > > For now, I think you will get most predictable behaviour if you read > after writer has closed the file. > > Raghu. > > j2eeiscool wrote: >> Hi Dhruba, >> >> For my test I do have a Reader and Writer thread. The Reader blocks till >> the >> InputStream is available: >> >> The Reader gets the following exception till the Writer is done : >> >> org.apache.hadoop.ipc.RemoteException: java.io.IOException: Cannot open >> filename /hadoopdata0.txt >> at org.apache.hadoop.dfs.NameNode.open(NameNode.java:269) >> at sun.reflect.GeneratedMethodAccessor3.invoke(Unknown Source) >> at >> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25) >> at java.lang.reflect.Method.invoke(Method.java:597) >> at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:340) >> at org.apache.hadoop.ipc.Server$Handler.run(Server.java:566) >> >> at org.apache.hadoop.ipc.Client.call(Client.java:470) >> at org.apache.hadoop.ipc.RPC$Invoker.invoke(RPC.java:165) >> at org.apache.hadoop.dfs.$Proxy0.open(Unknown Source) >> at sun.reflect.GeneratedMethodAccessor2.invoke(Unknown Source) >> at >> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25) >> at java.lang.reflect.Method.invoke(Method.java:585) >> at >> org.apache.hadoop.io.retry.RetryInvocationHandler.invokeMethod(RetryInvocationHandler.java:82) >> at >> org.apache.hadoop.io.retry.RetryInvocationHandler.invoke(RetryInvocationHandler.java:59) >> at org.apache.hadoop.dfs.$Proxy0.open(Unknown Source) >> at >> org.apache.hadoop.dfs.DFSClient$DFSInputStream.openInfo(DFSClient.java:864) >> at >> org.apache.hadoop.dfs.DFSClient$DFSInputStream.<init>(DFSClient.java:856) >> at org.apache.hadoop.dfs.DFSClient.open(DFSClient.java:277) >> at >> org.apache.hadoop.dfs.DistributedFileSystem.open(DistributedFileSystem.java:122) >> at org.apache.hadoop.fs.FileSystem.open(FileSystem.java:244) >> at HadoopDSMStore.select(HadoopDSMStore.java:44) >> at HadoopDSMStore$ReaderThread.run(HadoopDSMStore.java:174) >> org.apache.hadoop.ipc.RemoteException: java.io.IOException: Cannot open >> filename /hadoopdata0.txt >> at org.apache.hadoop.dfs.NameNode.open(NameNode.java:269) >> at sun.reflect.GeneratedMethodAccessor3.invoke(Unknown Source) >> at >> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25) >> at java.lang.reflect.Method.invoke(Method.java:597) >> at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:340) >> at org.apache.hadoop.ipc.Server$Handler.run(Server.java:566) >> >> at HadoopDSMStore.select(HadoopDSMStore.java:44) >> at HadoopDSMStore$ReaderThread.run(HadoopDSMStore.java:174) >> >> >> 1.Is there an api (like isFileAvailable(fileName)) the Reader needs to >> check >> before starting ? >> >> 2.Should there be a delay between Writer end and Reader start ? >> >> Thanx, >> Taj > > > > -- View this message in context: http://www.nabble.com/HDFS-File-Read-tf4773580.html#a13808081 Sent from the Hadoop Users mailing list archive at Nabble.com.