Re: HDFS File Read

Aaron Kimball Sat, 17 Nov 2007 16:46:39 -0800

You could write the file out under a dummy name and then rename it tothe target filename after the write is complete. The reader simplyblocks until the correct filename exists.


- Aaron


j2eeiscool wrote:

Hi Raghu,

I understand that.

I have also read that there is something in the works which will address
some of this (Reader able to get data before Writer is completely done:
HADOOP-1700).


In my test the Writer and Reader are different threads (they could be even
different processes).

So how does the Reader know that the Writer is done writing the data (my
requirement is that the Reader grab the data asap)?

1.Previously I was relying on the Reader NOT gettting the Exception
(07/11/17 11:07:13 INFO fs.DFSClient: Could not obtain block
blk_3484370064020998905 from any node:  java.io.IOException: No live nodes
contain current block) as a starting point for the Reader.

2.Now I have added the following check on the Reader side:

                DistributedFileSystem fileSystem = new DistributedFileSystem();
                fileSystem.initialize(uri, conf);
                Path path = new Path(sKey);
                while (!fileSystem.exists(path)) {
                        try {
                                Thread.sleep(30);
                        } catch (InterruptedException e) {
                                // TODO Auto-generated catch block
                                e.printStackTrace();
                        }
                }

But still get this exception from time to time:

07/11/17 11:07:10 INFO fs.DFSClient: Could not obtain block
blk_8590062477849775138 from any node:  java.io.IOException: No live nodes
contain current block
07/11/17 11:07:10 WARN fs.DFSClient: DFS Read: java.io.IOException:
Blocklist for /hadoopdata0.txt has changed!
        at
org.apache.hadoop.dfs.DFSClient$DFSInputStream.openInfo(DFSClient.java:871)
        at
org.apache.hadoop.dfs.DFSClient$DFSInputStream.chooseDataNode(DFSClient.java:1161)
        at
org.apache.hadoop.dfs.DFSClient$DFSInputStream.blockSeekTo(DFSClient.java:1004)
        at
org.apache.hadoop.dfs.DFSClient$DFSInputStream.read(DFSClient.java:1107)
        at java.io.DataInputStream.read(DataInputStream.java:80)
        at HadoopDSMStore$ReaderThread.run(HadoopDSMStore.java:206)

java.io.IOException: Blocklist for /hadoopdata0.txt has changed!
        at
org.apache.hadoop.dfs.DFSClient$DFSInputStream.openInfo(DFSClient.java:871)
        at
org.apache.hadoop.dfs.DFSClient$DFSInputStream.chooseDataNode(DFSClient.java:1161)
        at
org.apache.hadoop.dfs.DFSClient$DFSInputStream.blockSeekTo(DFSClient.java:1004)
        at
org.apache.hadoop.dfs.DFSClient$DFSInputStream.read(DFSClient.java:1107)
        at java.io.DataInputStream.read(DataInputStream.java:80)
        at HadoopDSMStore$ReaderThread.run(HadoopDSMStore.java:206)
07/11/17 11:07:10 INFO fs.DFSClient: Could not obtain block
blk_3484370064020998905 from any node:  java.io.IOException: No live nodes
contain current block


I could build an explicit hand-off from the Writer to Reader but that would
be tricky for inter processes.

Any ideas.

Thanx,
Taj



Raghu Angadi wrote:

Taj,

I don't know what you are trying to do but simultaneous write and readwon't work on any filesystem (unless reader is more complicated thatwhat you had).

For now, I think you will get most predictable behaviour if you readafter writer has closed the file.


Raghu.

j2eeiscool wrote:

Hi Dhruba,

For my test I do have a Reader and Writer thread. The Reader blocks till
the
InputStream is available:

The Reader gets the following exception till the Writer is done :

org.apache.hadoop.ipc.RemoteException: java.io.IOException: Cannot open
filename /hadoopdata0.txt
        at org.apache.hadoop.dfs.NameNode.open(NameNode.java:269)
        at sun.reflect.GeneratedMethodAccessor3.invoke(Unknown Source)
        at
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
        at java.lang.reflect.Method.invoke(Method.java:597)
        at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:340)
        at org.apache.hadoop.ipc.Server$Handler.run(Server.java:566)

        at org.apache.hadoop.ipc.Client.call(Client.java:470)
        at org.apache.hadoop.ipc.RPC$Invoker.invoke(RPC.java:165)
        at org.apache.hadoop.dfs.$Proxy0.open(Unknown Source)
        at sun.reflect.GeneratedMethodAccessor2.invoke(Unknown Source)
        at
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
        at java.lang.reflect.Method.invoke(Method.java:585)
        at
org.apache.hadoop.io.retry.RetryInvocationHandler.invokeMethod(RetryInvocationHandler.java:82)
        at
org.apache.hadoop.io.retry.RetryInvocationHandler.invoke(RetryInvocationHandler.java:59)
        at org.apache.hadoop.dfs.$Proxy0.open(Unknown Source)
        at
org.apache.hadoop.dfs.DFSClient$DFSInputStream.openInfo(DFSClient.java:864)
        at
org.apache.hadoop.dfs.DFSClient$DFSInputStream.<init>(DFSClient.java:856)
        at org.apache.hadoop.dfs.DFSClient.open(DFSClient.java:277)
        at
org.apache.hadoop.dfs.DistributedFileSystem.open(DistributedFileSystem.java:122)
        at org.apache.hadoop.fs.FileSystem.open(FileSystem.java:244)
        at HadoopDSMStore.select(HadoopDSMStore.java:44)
        at HadoopDSMStore$ReaderThread.run(HadoopDSMStore.java:174)
org.apache.hadoop.ipc.RemoteException: java.io.IOException: Cannot open
filename /hadoopdata0.txt
        at org.apache.hadoop.dfs.NameNode.open(NameNode.java:269)
        at sun.reflect.GeneratedMethodAccessor3.invoke(Unknown Source)
        at
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
        at java.lang.reflect.Method.invoke(Method.java:597)
        at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:340)
        at org.apache.hadoop.ipc.Server$Handler.run(Server.java:566)

        at HadoopDSMStore.select(HadoopDSMStore.java:44)
        at HadoopDSMStore$ReaderThread.run(HadoopDSMStore.java:174)


1.Is there an api (like isFileAvailable(fileName)) the Reader needs to
check
before starting ?

2.Should there be a delay between Writer end and Reader start ?

Thanx,
Taj

Re: HDFS File Read

Reply via email to