Hi Raghu, Good catch, thanx. totalBytesRead is not used for any decision etc.
I ran the client from another m/c and read was about 4 times faster. I have the tcpdump from the original client m/c. This is probably asking too much but anything in particular I should be looking in the tcpdump. Is (tcpdump) about 16 megs in size. Thanx, Taj Raghu Angadi wrote: > > > Thats too long.. buffer size does not explain it. Only small problem I > see in your code: > > > totalBytesRead += bytesReadThisRead; > > fileNotReadFully = (bytesReadThisRead != -1); > > totalBytesRead is off by 1. Not sure where totalBytesRead is used. > > If you can, try to check tcpdump on your client machine (for datanode > port 50010) > > Raghu. > > j2eeiscool wrote: >> Hi Raghu, >> >> Many thanx for your reply: >> >> The write takes approximately: 11367 millisecs. >> >> The read takes approximately: 1610565 millisecs. >> >> File size is 68573254 bytes and hdfs block size is 64 megs. >> >> >> Here is the Writer code: >> >> FileInputStream fis = null; >> OutputStream os = null; >> try { >> fis = new FileInputStream(new File(inputFile)); >> os = dsmStore.insert(outputFile); >> >> >> >> dsmStore.insert does the following: >> { >> >> DistributedFileSystem fileSystem = new DistributedFileSystem(); >> fileSystem.initialize(uri, conf); >> Path path = new Path(sKey); >> //writing: >> FSDataOutputStream dataOutputStream = fileSystem.create(path); >> >> return dataOutputStream; >> >> } >> >> >> byte[] data = new byte[4096]; >> while (fis.read(data) != -1) { >> os.write(data); >> os.flush(); >> } >> } catch (Exception e) { >> e.printStackTrace(); >> } >> finally { >> if (os != null) { >> try { >> os.close(); >> } catch (IOException e) { >> // TODO Auto-generated catch >> block >> e.printStackTrace(); >> } >> } >> >> if (fis != null) { >> try { >> fis.close(); >> } catch (IOException e) { >> // TODO Auto-generated catch >> block >> e.printStackTrace(); >> } >> } >> >> >> } >> } >> >> >> Here is the Reader code: >> >> >> byte[] data = new byte[4096]; >> int totalBytesRead = 0; >> boolean fileNotReadFully = true; >> InputStream is = dsmStore.select(fileName); >> >> >> dsmStore.select does the following: >> { >> >> DistributedFileSystem fileSystem = new DistributedFileSystem(); >> fileSystem.initialize(uri, conf); >> Path path = new Path(sKey); >> FSDataInputStream dataInputStream = fileSystem.open(path); >> >> return dataInputStream; >> >> } >> >> >> >> while (fileNotReadFully) { >> int bytesReadThisRead = 0 ; >> try { >> bytesReadThisRead = is.read(data); >> totalBytesRead += bytesReadThisRead; >> fileNotReadFully = (bytesReadThisRead >> != -1); >> } catch (Exception e) { >> e.printStackTrace(); >> } >> } >> if (is != null) { >> try { >> is.close(); >> } catch (IOException e) { >> // TODO Auto-generated catch block >> e.printStackTrace(); >> } >> } >> >> >> Could probably try different buffer sizes etc. >> >> Thanx, >> Taj >> >> >> Raghu Angadi wrote: >>> >>> How slow is it? May the code that reads is relevant too. >>> >>> Raghu. >>> >>> j2eeiscool wrote: >>>> Hi, >>>> >>>> I am new to hadoop. We are evaluating HDFS for a reliable, disitrbuted >>>> file >>>> system use. >>>> >>>> From the tests (1 name + 1 data, both on different RHEL 4 m/cs, client >>>> running on the name node m/c) I have run so far: >>>> >>>> 1.The writes are very fast. >>>> >>>> 2.The read is very slow (reading a 68 megs file). Here is the sample >>>> code. >>>> Any ideas what could be going wrong: >>>> >>>> >>>> public InputStream select(String sKey) throws RecordNotFoundException, >>>> IOException { >>>> DistributedFileSystem fileSystem = new DistributedFileSystem(); >>>> fileSystem.initialize(uri, conf); >>>> Path path = new Path(sKey); >>>> FSDataInputStream dataInputStream = fileSystem.open(path); >>>> return dataInputStream; >>>> >>>> } >>>> >>>> Thanx, >>>> Taj >>>> >>>> >>> >>> >> > > > -- View this message in context: http://www.nabble.com/HDFS-File-Read-tf4773580.html#a13715077 Sent from the Hadoop Users mailing list archive at Nabble.com.