Aaron/Dhruba/Ted,
Many thanx for your replies.
I took the temporary file followed by rename route and am past this
exception now.
-Taj
Aaron Kimball wrote:
>
> You could write the file out under a dummy name and then rename it to
> the target filename after the write is complete. The reader simply
> blocks until the correct filename exists.
>
> - Aaron
>
> j2eeiscool wrote:
>> Hi Raghu,
>>
>> I understand that.
>>
>> I have also read that there is something in the works which will address
>> some of this (Reader able to get data before Writer is completely done:
>> HADOOP-1700).
>>
>>
>> In my test the Writer and Reader are different threads (they could be
>> even
>> different processes).
>>
>> So how does the Reader know that the Writer is done writing the data (my
>> requirement is that the Reader grab the data asap)?
>>
>> 1.Previously I was relying on the Reader NOT gettting the Exception
>> (07/11/17 11:07:13 INFO fs.DFSClient: Could not obtain block
>> blk_3484370064020998905 from any node: java.io.IOException: No live
>> nodes
>> contain current block) as a starting point for the Reader.
>>
>> 2.Now I have added the following check on the Reader side:
>>
>> DistributedFileSystem fileSystem = new DistributedFileSystem();
>> fileSystem.initialize(uri, conf);
>> Path path = new Path(sKey);
>> while (!fileSystem.exists(path)) {
>> try {
>> Thread.sleep(30);
>> } catch (InterruptedException e) {
>> // TODO Auto-generated catch block
>> e.printStackTrace();
>> }
>> }
>>
>> But still get this exception from time to time:
>>
>> 07/11/17 11:07:10 INFO fs.DFSClient: Could not obtain block
>> blk_8590062477849775138 from any node: java.io.IOException: No live
>> nodes
>> contain current block
>> 07/11/17 11:07:10 WARN fs.DFSClient: DFS Read: java.io.IOException:
>> Blocklist for /hadoopdata0.txt has changed!
>> at
>> org.apache.hadoop.dfs.DFSClient$DFSInputStream.openInfo(DFSClient.java:871)
>> at
>> org.apache.hadoop.dfs.DFSClient$DFSInputStream.chooseDataNode(DFSClient.java:1161)
>> at
>> org.apache.hadoop.dfs.DFSClient$DFSInputStream.blockSeekTo(DFSClient.java:1004)
>> at
>> org.apache.hadoop.dfs.DFSClient$DFSInputStream.read(DFSClient.java:1107)
>> at java.io.DataInputStream.read(DataInputStream.java:80)
>> at HadoopDSMStore$ReaderThread.run(HadoopDSMStore.java:206)
>>
>> java.io.IOException: Blocklist for /hadoopdata0.txt has changed!
>> at
>> org.apache.hadoop.dfs.DFSClient$DFSInputStream.openInfo(DFSClient.java:871)
>> at
>> org.apache.hadoop.dfs.DFSClient$DFSInputStream.chooseDataNode(DFSClient.java:1161)
>> at
>> org.apache.hadoop.dfs.DFSClient$DFSInputStream.blockSeekTo(DFSClient.java:1004)
>> at
>> org.apache.hadoop.dfs.DFSClient$DFSInputStream.read(DFSClient.java:1107)
>> at java.io.DataInputStream.read(DataInputStream.java:80)
>> at HadoopDSMStore$ReaderThread.run(HadoopDSMStore.java:206)
>> 07/11/17 11:07:10 INFO fs.DFSClient: Could not obtain block
>> blk_3484370064020998905 from any node: java.io.IOException: No live
>> nodes
>> contain current block
>>
>>
>> I could build an explicit hand-off from the Writer to Reader but that
>> would
>> be tricky for inter processes.
>>
>> Any ideas.
>>
>> Thanx,
>> Taj
>>
>>
>>
>> Raghu Angadi wrote:
>>> Taj,
>>>
>>> I don't know what you are trying to do but simultaneous write and read
>>> won't work on any filesystem (unless reader is more complicated that
>>> what you had).
>>>
>>> For now, I think you will get most predictable behaviour if you read
>>> after writer has closed the file.
>>>
>>> Raghu.
>>>
>>> j2eeiscool wrote:
>>>> Hi Dhruba,
>>>>
>>>> For my test I do have a Reader and Writer thread. The Reader blocks
>>>> till
>>>> the
>>>> InputStream is available:
>>>>
>>>> The Reader gets the following exception till the Writer is done :
>>>>
>>>> org.apache.hadoop.ipc.RemoteException: java.io.IOException: Cannot open
>>>> filename /hadoopdata0.txt
>>>> at org.apache.hadoop.dfs.NameNode.open(NameNode.java:269)
>>>> at sun.reflect.GeneratedMethodAccessor3.invoke(Unknown Source)
>>>> at
>>>> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
>>>> at java.lang.reflect.Method.invoke(Method.java:597)
>>>> at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:340)
>>>> at org.apache.hadoop.ipc.Server$Handler.run(Server.java:566)
>>>>
>>>> at org.apache.hadoop.ipc.Client.call(Client.java:470)
>>>> at org.apache.hadoop.ipc.RPC$Invoker.invoke(RPC.java:165)
>>>> at org.apache.hadoop.dfs.$Proxy0.open(Unknown Source)
>>>> at sun.reflect.GeneratedMethodAccessor2.invoke(Unknown Source)
>>>> at
>>>> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
>>>> at java.lang.reflect.Method.invoke(Method.java:585)
>>>> at
>>>> org.apache.hadoop.io.retry.RetryInvocationHandler.invokeMethod(RetryInvocationHandler.java:82)
>>>> at
>>>> org.apache.hadoop.io.retry.RetryInvocationHandler.invoke(RetryInvocationHandler.java:59)
>>>> at org.apache.hadoop.dfs.$Proxy0.open(Unknown Source)
>>>> at
>>>> org.apache.hadoop.dfs.DFSClient$DFSInputStream.openInfo(DFSClient.java:864)
>>>> at
>>>> org.apache.hadoop.dfs.DFSClient$DFSInputStream.<init>(DFSClient.java:856)
>>>> at org.apache.hadoop.dfs.DFSClient.open(DFSClient.java:277)
>>>> at
>>>> org.apache.hadoop.dfs.DistributedFileSystem.open(DistributedFileSystem.java:122)
>>>> at org.apache.hadoop.fs.FileSystem.open(FileSystem.java:244)
>>>> at HadoopDSMStore.select(HadoopDSMStore.java:44)
>>>> at HadoopDSMStore$ReaderThread.run(HadoopDSMStore.java:174)
>>>> org.apache.hadoop.ipc.RemoteException: java.io.IOException: Cannot open
>>>> filename /hadoopdata0.txt
>>>> at org.apache.hadoop.dfs.NameNode.open(NameNode.java:269)
>>>> at sun.reflect.GeneratedMethodAccessor3.invoke(Unknown Source)
>>>> at
>>>> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
>>>> at java.lang.reflect.Method.invoke(Method.java:597)
>>>> at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:340)
>>>> at org.apache.hadoop.ipc.Server$Handler.run(Server.java:566)
>>>>
>>>> at HadoopDSMStore.select(HadoopDSMStore.java:44)
>>>> at HadoopDSMStore$ReaderThread.run(HadoopDSMStore.java:174)
>>>>
>>>>
>>>> 1.Is there an api (like isFileAvailable(fileName)) the Reader needs to
>>>> check
>>>> before starting ?
>>>>
>>>> 2.Should there be a delay between Writer end and Reader start ?
>>>>
>>>> Thanx,
>>>> Taj
>>>
>>>
>>>
>>
>
>
--
View this message in context:
http://www.nabble.com/HDFS-File-Read-tf4773580.html#a13836568
Sent from the Hadoop Users mailing list archive at Nabble.com.