Hi Raghu,
Good catch, thanx. totalBytesRead is not used for any decision etc.
I ran the client from another m/c and read was about 4 times faster.
I have the tcpdump from the original client m/c.
This is probably asking too much but anything in particular I should be
looking in the tcpdump.
Is (tcpdump) about 16 megs in size.
Thanx,
Taj
Raghu Angadi wrote:
>
>
> Thats too long.. buffer size does not explain it. Only small problem I
> see in your code:
>
> > totalBytesRead += bytesReadThisRead;
> > fileNotReadFully = (bytesReadThisRead != -1);
>
> totalBytesRead is off by 1. Not sure where totalBytesRead is used.
>
> If you can, try to check tcpdump on your client machine (for datanode
> port 50010)
>
> Raghu.
>
> j2eeiscool wrote:
>> Hi Raghu,
>>
>> Many thanx for your reply:
>>
>> The write takes approximately: 11367 millisecs.
>>
>> The read takes approximately: 1610565 millisecs.
>>
>> File size is 68573254 bytes and hdfs block size is 64 megs.
>>
>>
>> Here is the Writer code:
>>
>> FileInputStream fis = null;
>> OutputStream os = null;
>> try {
>> fis = new FileInputStream(new File(inputFile));
>> os = dsmStore.insert(outputFile);
>>
>>
>>
>> dsmStore.insert does the following:
>> {
>>
>> DistributedFileSystem fileSystem = new DistributedFileSystem();
>> fileSystem.initialize(uri, conf);
>> Path path = new Path(sKey);
>> //writing:
>> FSDataOutputStream dataOutputStream = fileSystem.create(path);
>>
>> return dataOutputStream;
>>
>> }
>>
>>
>> byte[] data = new byte[4096];
>> while (fis.read(data) != -1) {
>> os.write(data);
>> os.flush();
>> }
>> } catch (Exception e) {
>> e.printStackTrace();
>> }
>> finally {
>> if (os != null) {
>> try {
>> os.close();
>> } catch (IOException e) {
>> // TODO Auto-generated catch
>> block
>> e.printStackTrace();
>> }
>> }
>>
>> if (fis != null) {
>> try {
>> fis.close();
>> } catch (IOException e) {
>> // TODO Auto-generated catch
>> block
>> e.printStackTrace();
>> }
>> }
>>
>>
>> }
>> }
>>
>>
>> Here is the Reader code:
>>
>>
>> byte[] data = new byte[4096];
>> int totalBytesRead = 0;
>> boolean fileNotReadFully = true;
>> InputStream is = dsmStore.select(fileName);
>>
>>
>> dsmStore.select does the following:
>> {
>>
>> DistributedFileSystem fileSystem = new DistributedFileSystem();
>> fileSystem.initialize(uri, conf);
>> Path path = new Path(sKey);
>> FSDataInputStream dataInputStream = fileSystem.open(path);
>>
>> return dataInputStream;
>>
>> }
>>
>>
>>
>> while (fileNotReadFully) {
>> int bytesReadThisRead = 0 ;
>> try {
>> bytesReadThisRead = is.read(data);
>> totalBytesRead += bytesReadThisRead;
>> fileNotReadFully = (bytesReadThisRead
>> != -1);
>> } catch (Exception e) {
>> e.printStackTrace();
>> }
>> }
>> if (is != null) {
>> try {
>> is.close();
>> } catch (IOException e) {
>> // TODO Auto-generated catch block
>> e.printStackTrace();
>> }
>> }
>>
>>
>> Could probably try different buffer sizes etc.
>>
>> Thanx,
>> Taj
>>
>>
>> Raghu Angadi wrote:
>>>
>>> How slow is it? May the code that reads is relevant too.
>>>
>>> Raghu.
>>>
>>> j2eeiscool wrote:
>>>> Hi,
>>>>
>>>> I am new to hadoop. We are evaluating HDFS for a reliable, disitrbuted
>>>> file
>>>> system use.
>>>>
>>>> From the tests (1 name + 1 data, both on different RHEL 4 m/cs, client
>>>> running on the name node m/c) I have run so far:
>>>>
>>>> 1.The writes are very fast.
>>>>
>>>> 2.The read is very slow (reading a 68 megs file). Here is the sample
>>>> code.
>>>> Any ideas what could be going wrong:
>>>>
>>>>
>>>> public InputStream select(String sKey) throws RecordNotFoundException,
>>>> IOException {
>>>> DistributedFileSystem fileSystem = new DistributedFileSystem();
>>>> fileSystem.initialize(uri, conf);
>>>> Path path = new Path(sKey);
>>>> FSDataInputStream dataInputStream = fileSystem.open(path);
>>>> return dataInputStream;
>>>>
>>>> }
>>>>
>>>> Thanx,
>>>> Taj
>>>>
>>>>
>>>
>>>
>>
>
>
>
--
View this message in context:
http://www.nabble.com/HDFS-File-Read-tf4773580.html#a13715077
Sent from the Hadoop Users mailing list archive at Nabble.com.