Hi,
I'm writing a fairly simple client application which basically concatenates
the output files of a MapReduce job (Hadoop 20.2). The code is as follows:
DFSClient client = new DFSClient(new Configuration());
FileStatus[] listing = client.listPaths("/myoutputdir");
int read = 0;
byte[] buffer = new byte[2048];
StringBuilder builder = new StringBuilder();
for(FileStatus file : listing) {
if(builder.length() > 0)
builder.append("\n");
String filename = file.getPath().getName();
if(filename.startsWith("part")) {
InputStream input = client.open(filename);
read = input.read(buffer);
while(read > 0) {
builder.append(new String(buffer, 0, read));
read = input.read(buffer);
}
input.close();
}
}
I'm following the RPC calls and I notice most of them are working. For
instance, I see a call for "getListing(/myoutputdir)." Afterwards, I
successfully retrieve the files in the directory. Once I reach the
client.open() statement however, a call is sent
"getBlockLocations(part-r-00000, 0, 671088640)" which I believe, going out
on a limb here, finds the block locations for the file part-r-00000.
Unfortunately this fails and worse yet, debugging information is slim. I get
back:
org.apache.hadoop.ipc.RemoteException: java.io.IOException:
java.lang.NullPointerException
at org.apache.hadoop.ip.Client.call(Client.java:740)
at org.apache.hadoop.ipc.RPC$Invoker.invoke(RPC.java:220)
at $Proxy54.getBlockLocations (Unknown Source)
...
Since the exception is on the remote side I don't get a lot of help from the
stack trace. Client.java:740 is actually a statement to fill in the stack
trace. I also couldn't find anything in the namenode log either. Any
suggestions?