I could not understand much from the code snippet, so here are few quick questions to help you. Is the file in question is a logical (normal disk) file or a DFS file? If logical, do you have the same file in the same path on all the nodes? If not, you should either have it replicated in same path on all nodes or first get it into DFS using -copyFromLocal command of hadoop.
HTH. On Tue, Sep 21, 2010 at 3:21 PM, Bill Streckfus <[email protected]> wrote: > Hi, > > I'm writing a fairly simple client application which basically concatenates > the output files of a MapReduce job (Hadoop 20.2). The code is as follows: > > DFSClient client = new DFSClient(new Configuration()); > FileStatus[] listing = client.listPaths("/myoutputdir"); > int read = 0; > byte[] buffer = new byte[2048]; > StringBuilder builder = new StringBuilder(); > for(FileStatus file : listing) { > if(builder.length() > 0) > builder.append("\n"); > > String filename = file.getPath().getName(); > > if(filename.startsWith("part")) { > InputStream input = client.open(filename); > > read = input.read(buffer); > while(read > 0) { > builder.append(new String(buffer, 0, read)); > read = input.read(buffer); > } > > input.close(); > } > } > > I'm following the RPC calls and I notice most of them are working. For > instance, I see a call for "getListing(/myoutputdir)." Afterwards, I > successfully retrieve the files in the directory. Once I reach the > client.open() statement however, a call is sent > "getBlockLocations(part-r-00000, 0, 671088640)" which I believe, going out > on a limb here, finds the block locations for the file part-r-00000. > Unfortunately this fails and worse yet, debugging information is slim. I get > back: > > org.apache.hadoop.ipc.RemoteException: java.io.IOException: > java.lang.NullPointerException > > at org.apache.hadoop.ip.Client.call(Client.java:740) > at org.apache.hadoop.ipc.RPC$Invoker.invoke(RPC.java:220) > at $Proxy54.getBlockLocations (Unknown Source) > ... > > Since the exception is on the remote side I don't get a lot of help from the > stack trace. Client.java:740 is actually a statement to fill in the stack > trace. I also couldn't find anything in the namenode log either. Any > suggestions?
