On 20.02.2011 09:50, Ivan Zhakov wrote:
On Wed, Dec 29, 2010 at 22:37, Stefan Fuhrmann<eq...@web.de>  wrote:
The fopen() calls should be eliminated by the
file handle cache. IOW, they should already be
addressed on the performance branch. Please
let me know if that is not the case.

Just my 20 cents.
High roller.
My belief that file handles cache should be implemented at OS level
and I pretty sure that it's implemented.
You can certainly data to demonstrate your claim?

In fact, fopen() is extremely expensive (1..5ms) on FS with
ACLs. Even for a local, low overhead (EXT3) FS, the effect
of handle caching is significant:

time ./svnadmin verify $TSVN_MIRROR -q -F 256 -M 0
real   1m46.603s
user   1m43.474s
sys    0m3.132s

time ./svnadmin verify $TSVN_MIRROR -q -F 0 -M 0
real   2m26.664s
user   2m0.856s
sys    0m25.818s

Note that the gains are split about 50:50 between the OS
and the application. Things become even more interesting
albeit less easily demonstrable with concurrent queries
being run by a threaded server. One would expect a even
higher level of reuse.
And right way to eliminate
number of duplicate fopen()/reads() is improving our FS API.
Why would that be necessary if the OS already takes care
of all the optimizations?

FSFS6 is about optimizing the interface between OS and
the FSFS code: Fewer seek()s and drastically reduced
number of read()s.

Once that is in place and its behavior well understood, we
may start designing I/O aggregation and scheduling. In
particular holding off requests while another request already
fetches the desired data, will be a very interesting task

From what I understood of the FS API there is very little
that needed to be added to allow for effective I/O optimization.
Basically, I simple "advise" or "prefetch" option on the
read functions could possibly do the trick.

If we get to that stage, I'm sure to receive "the OS should
take care of I/O scheduling and stuff" posts.
I didn't reviewed how file handles cache is implemented in
fs-performance branch, but I'm nearly to -1 against implementing cache
of open file handles in Subversion.
File handle caching definitely has its drawbacks, risks
in particular. The number of file handles within an OS
instance is quite limited (typ. 1000) and open files may
prevent file deletion (e.g. during packing). The code is
supposed to take care of the latter but may be faulty.

Alternative designs are welcome.

-- Stefan^2.

Reply via email to