Re: HDFS File Read

j2eeiscool Mon, 19 Nov 2007 05:58:18 -0800

Aaron/Dhruba/Ted,

Many thanx for your replies.
I took the temporary file followed by rename route and am past this
exception now.


-Taj


Aaron Kimball wrote:
> 
> You could write the file out under a dummy name and then rename it to 
> the target filename after the write is complete. The reader simply 
> blocks until the correct filename exists.
> 
> - Aaron
> 
> j2eeiscool wrote:
>> Hi Raghu,
>> 
>> I understand that.
>> 
>> I have also read that there is something in the works which will address
>> some of this (Reader able to get data before Writer is completely done:
>> HADOOP-1700).
>> 
>> 
>> In my test the Writer and Reader are different threads (they could be
>> even
>> different processes).
>> 
>> So how does the Reader know that the Writer is done writing the data (my
>> requirement is that the Reader grab the data asap)?
>> 
>> 1.Previously I was relying on the Reader NOT gettting the Exception
>> (07/11/17 11:07:13 INFO fs.DFSClient: Could not obtain block
>> blk_3484370064020998905 from any node:  java.io.IOException: No live
>> nodes
>> contain current block) as a starting point for the Reader.
>> 
>> 2.Now I have added the following check on the Reader side:
>> 
>>              DistributedFileSystem fileSystem = new DistributedFileSystem();
>>              fileSystem.initialize(uri, conf);
>>              Path path = new Path(sKey);
>>              while (!fileSystem.exists(path)) {
>>                      try {
>>                              Thread.sleep(30);
>>                      } catch (InterruptedException e) {
>>                              // TODO Auto-generated catch block
>>                              e.printStackTrace();
>>                      }
>>              }
>> 
>> But still get this exception from time to time:
>> 
>> 07/11/17 11:07:10 INFO fs.DFSClient: Could not obtain block
>> blk_8590062477849775138 from any node:  java.io.IOException: No live
>> nodes
>> contain current block
>> 07/11/17 11:07:10 WARN fs.DFSClient: DFS Read: java.io.IOException:
>> Blocklist for /hadoopdata0.txt has changed!
>>         at
>> org.apache.hadoop.dfs.DFSClient$DFSInputStream.openInfo(DFSClient.java:871)
>>         at
>> org.apache.hadoop.dfs.DFSClient$DFSInputStream.chooseDataNode(DFSClient.java:1161)
>>         at
>> org.apache.hadoop.dfs.DFSClient$DFSInputStream.blockSeekTo(DFSClient.java:1004)
>>         at
>> org.apache.hadoop.dfs.DFSClient$DFSInputStream.read(DFSClient.java:1107)
>>         at java.io.DataInputStream.read(DataInputStream.java:80)
>>         at HadoopDSMStore$ReaderThread.run(HadoopDSMStore.java:206)
>> 
>> java.io.IOException: Blocklist for /hadoopdata0.txt has changed!
>>         at
>> org.apache.hadoop.dfs.DFSClient$DFSInputStream.openInfo(DFSClient.java:871)
>>         at
>> org.apache.hadoop.dfs.DFSClient$DFSInputStream.chooseDataNode(DFSClient.java:1161)
>>         at
>> org.apache.hadoop.dfs.DFSClient$DFSInputStream.blockSeekTo(DFSClient.java:1004)
>>         at
>> org.apache.hadoop.dfs.DFSClient$DFSInputStream.read(DFSClient.java:1107)
>>         at java.io.DataInputStream.read(DataInputStream.java:80)
>>         at HadoopDSMStore$ReaderThread.run(HadoopDSMStore.java:206)
>> 07/11/17 11:07:10 INFO fs.DFSClient: Could not obtain block
>> blk_3484370064020998905 from any node:  java.io.IOException: No live
>> nodes
>> contain current block
>> 
>> 
>> I could build an explicit hand-off from the Writer to Reader but that
>> would
>> be tricky for inter processes.
>> 
>> Any ideas.
>> 
>> Thanx,
>> Taj
>> 
>> 
>> 
>> Raghu Angadi wrote:
>>> Taj,
>>>
>>> I don't know what you are trying to do but simultaneous write and read 
>>> won't work on any filesystem (unless reader is more complicated that 
>>> what you had).
>>>
>>> For now, I think you will get most predictable behaviour if you read 
>>> after writer has closed the file.
>>>
>>> Raghu.
>>>
>>> j2eeiscool wrote:
>>>> Hi Dhruba,
>>>>
>>>> For my test I do have a Reader and Writer thread. The Reader blocks
>>>> till
>>>> the
>>>> InputStream is available:
>>>>
>>>> The Reader gets the following exception till the Writer is done :
>>>>
>>>> org.apache.hadoop.ipc.RemoteException: java.io.IOException: Cannot open
>>>> filename /hadoopdata0.txt
>>>>         at org.apache.hadoop.dfs.NameNode.open(NameNode.java:269)
>>>>         at sun.reflect.GeneratedMethodAccessor3.invoke(Unknown Source)
>>>>         at
>>>> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
>>>>         at java.lang.reflect.Method.invoke(Method.java:597)
>>>>         at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:340)
>>>>         at org.apache.hadoop.ipc.Server$Handler.run(Server.java:566)
>>>>
>>>>         at org.apache.hadoop.ipc.Client.call(Client.java:470)
>>>>         at org.apache.hadoop.ipc.RPC$Invoker.invoke(RPC.java:165)
>>>>         at org.apache.hadoop.dfs.$Proxy0.open(Unknown Source)
>>>>         at sun.reflect.GeneratedMethodAccessor2.invoke(Unknown Source)
>>>>         at
>>>> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
>>>>         at java.lang.reflect.Method.invoke(Method.java:585)
>>>>         at
>>>> org.apache.hadoop.io.retry.RetryInvocationHandler.invokeMethod(RetryInvocationHandler.java:82)
>>>>         at
>>>> org.apache.hadoop.io.retry.RetryInvocationHandler.invoke(RetryInvocationHandler.java:59)
>>>>         at org.apache.hadoop.dfs.$Proxy0.open(Unknown Source)
>>>>         at
>>>> org.apache.hadoop.dfs.DFSClient$DFSInputStream.openInfo(DFSClient.java:864)
>>>>         at
>>>> org.apache.hadoop.dfs.DFSClient$DFSInputStream.<init>(DFSClient.java:856)
>>>>         at org.apache.hadoop.dfs.DFSClient.open(DFSClient.java:277)
>>>>         at
>>>> org.apache.hadoop.dfs.DistributedFileSystem.open(DistributedFileSystem.java:122)
>>>>         at org.apache.hadoop.fs.FileSystem.open(FileSystem.java:244)
>>>>         at HadoopDSMStore.select(HadoopDSMStore.java:44)
>>>>         at HadoopDSMStore$ReaderThread.run(HadoopDSMStore.java:174)
>>>> org.apache.hadoop.ipc.RemoteException: java.io.IOException: Cannot open
>>>> filename /hadoopdata0.txt
>>>>         at org.apache.hadoop.dfs.NameNode.open(NameNode.java:269)
>>>>         at sun.reflect.GeneratedMethodAccessor3.invoke(Unknown Source)
>>>>         at
>>>> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
>>>>         at java.lang.reflect.Method.invoke(Method.java:597)
>>>>         at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:340)
>>>>         at org.apache.hadoop.ipc.Server$Handler.run(Server.java:566)
>>>>
>>>>         at HadoopDSMStore.select(HadoopDSMStore.java:44)
>>>>         at HadoopDSMStore$ReaderThread.run(HadoopDSMStore.java:174)
>>>>
>>>>
>>>> 1.Is there an api (like isFileAvailable(fileName)) the Reader needs to
>>>> check
>>>> before starting ?
>>>>
>>>> 2.Should there be a delay between Writer end and Reader start ?
>>>>
>>>> Thanx,
>>>> Taj
>>>
>>>
>>>
>> 
> 
> 

-- 
View this message in context: 
http://www.nabble.com/HDFS-File-Read-tf4773580.html#a13836568
Sent from the Hadoop Users mailing list archive at Nabble.com.

Re: HDFS File Read

Reply via email to