On Thu, 15 Oct 2009 11:32:35 +0200
Usman Waheed <[email protected]> wrote:
> Hi Todd,
>
> Some changes have been applied to the cluster based on the
> documentation (URL) you noted below,
I would also like to know what settings people are tuning on the
operating system level. The blog post mentioned here does not mention
much about that, except for the fileno changes.
We got about 3x the read performance when running DFSIOTest by mounting
our ext3 filesystems with the noatime parameter. I saw that mentioned
in the slides from some Cloudera presentation.
(For those who don't know, the noatime parameter turns off the
recording of access time on files. That's a horrible performance killer
since it means every read of a file also means that the kernel must do
a write. These writes are probably queued up, but still, if you don't
need the atime (very few applications do), turn it off!)
Have people been experimenting with different filesystems, or are most
of us running on top of ext3?
How about mounting ext3 with "data=writeback"? That's rumoured to give
the best throughput and could help with write performance. From
mount(8):
writeback
Data ordering is not preserved - data may be written into the main
file system
after its metadata has been committed to the journal. This is
rumoured to be the
highest throughput option. It guarantees internal file system
integrity,
however it can allow old data to appear in files after a crash and
journal recovery.
How would the HDFS consistency checks cope with old data appearing in
the unerlying files after a system crash?
Cheers,
\EF
--
Erik Forsberg <[email protected]>
Developer, Opera Software - http://www.opera.com/