Thats too long.. buffer size does not explain it. Only small problem I
see in your code:
> totalBytesRead += bytesReadThisRead;
> fileNotReadFully = (bytesReadThisRead != -1);
totalBytesRead is off by 1. Not sure where totalBytesRead is used.
If you can, try to check tcpdump on your client machine (for datanode
port 50010)
Raghu.
j2eeiscool wrote:
Hi Raghu,
Many thanx for your reply:
The write takes approximately: 11367 millisecs.
The read takes approximately: 1610565 millisecs.
File size is 68573254 bytes and hdfs block size is 64 megs.
Here is the Writer code:
FileInputStream fis = null;
OutputStream os = null;
try {
fis = new FileInputStream(new File(inputFile));
os = dsmStore.insert(outputFile);
dsmStore.insert does the following:
{
DistributedFileSystem fileSystem = new DistributedFileSystem();
fileSystem.initialize(uri, conf);
Path path = new Path(sKey);
//writing:
FSDataOutputStream dataOutputStream = fileSystem.create(path);
return dataOutputStream;
}
byte[] data = new byte[4096];
while (fis.read(data) != -1) {
os.write(data);
os.flush();
}
} catch (Exception e) {
e.printStackTrace();
}
finally {
if (os != null) {
try {
os.close();
} catch (IOException e) {
// TODO Auto-generated catch
block
e.printStackTrace();
}
}
if (fis != null) {
try {
fis.close();
} catch (IOException e) {
// TODO Auto-generated catch
block
e.printStackTrace();
}
}
}
}
Here is the Reader code:
byte[] data = new byte[4096];
int totalBytesRead = 0;
boolean fileNotReadFully = true;
InputStream is = dsmStore.select(fileName);
dsmStore.select does the following:
{
DistributedFileSystem fileSystem = new DistributedFileSystem();
fileSystem.initialize(uri, conf);
Path path = new Path(sKey);
FSDataInputStream dataInputStream = fileSystem.open(path);
return dataInputStream;
}
while (fileNotReadFully) {
int bytesReadThisRead = 0 ;
try {
bytesReadThisRead = is.read(data);
totalBytesRead += bytesReadThisRead;
fileNotReadFully = (bytesReadThisRead
!= -1);
} catch (Exception e) {
e.printStackTrace();
}
}
if (is != null) {
try {
is.close();
} catch (IOException e) {
// TODO Auto-generated catch block
e.printStackTrace();
}
}
Could probably try different buffer sizes etc.
Thanx,
Taj
Raghu Angadi wrote:
How slow is it? May the code that reads is relevant too.
Raghu.
j2eeiscool wrote:
Hi,
I am new to hadoop. We are evaluating HDFS for a reliable, disitrbuted
file
system use.
From the tests (1 name + 1 data, both on different RHEL 4 m/cs, client
running on the name node m/c) I have run so far:
1.The writes are very fast.
2.The read is very slow (reading a 68 megs file). Here is the sample
code.
Any ideas what could be going wrong:
public InputStream select(String sKey) throws RecordNotFoundException,
IOException {
DistributedFileSystem fileSystem = new DistributedFileSystem();
fileSystem.initialize(uri, conf);
Path path = new Path(sKey);
FSDataInputStream dataInputStream = fileSystem.open(path);
return dataInputStream;
}
Thanx,
Taj