On 03.01.2012 10:01, Ivan Zhakov wrote:
On Sun, Dec 18, 2011 at 22:46, Stefan Fuhrmann
<stefanfuhrm...@alice-dsl.de>  wrote:
For interested parties,

Work on said branch is done so far and I think
it is generally ready for being merged into /trunk.
But it is certainly a good idea to give it some
detailed review, in particular with regard to side-
effects on e.g. FSFS recovery code.

What it does in a nutshell:

* to FSFS svn_fs_t, attaches a cache for open file
  handles to reduce the number of open, lseek,
  read and close operations
* svnserve CL and apache config option to set
  the per-process maximum of cached handles
  (default: 16)

* use cached file handles for r/o access to revision
  files in FSFS
* notify the cache after revision files have been
  written ("flush" cache entries for these files)

With the repository on a linux RAM disk, it speeds
up a "cold" export of TSVN /trunk by approx. 10%.
Slower disks may show larger gains.

If there are no major objections, I would like to merge
the code into /trunk around mid-January.

Hi, Stefan!

As I stated before I'm -1 for merging file-handles cache stuff to trunk:
1. The performance gain is not significant and may vary from
environment to environment.

What would constitute a significant improvement?
Keep in mind that these measurements did not
involve any physical disk access or even things
like a SAN or even repositories on NFS shares.

From what I have seen, the patch eliminates up to
~75% of I/O activity.
2. We introduce really dirty hacks. After merging your branch we have
to worry about file handles to keep them in reusable state or do not
forget to notify file handles cache to flush this handle.

That is indeed an issue. I think I can rework the code
to disable the cache as soon as a transaction starts.
The cache is mainly effective in read requests.
3. By design handles are not intended to be cached, underlying objects
should be cached.
What would those underlying objects be? File buffers,
position of the file pointer? Well, that is *exactly* what
the cache does.

In fact, most of the performance improvement stems from
intelligently aligning and re-using existing data buffers
instead of re-reading the same data over and over.

  In your branch your introduce handles to handles.
APR does the same. Is that unreasonable?
4. It seems current implementation reuses file handles even error is
occurred when working with the handle. It's potentially dangerous from
my point view.

Interesting point. But I think it will not cause issues in
our case because we use it only within a FSFS "session".
Any I/O error should result in a failure of the respective FS
operation, in turn freeing the whole cache instance.

The whole problem with I/O improvement is that there
is probably no other way to make progress in the next
3 years: Some of it could be addressed within APR,
but that won't be available for Apache < 2.4 (2.6?).
A better solution is FSv2 but that is at least 3 years
away from now.

-- Stefan^2.

Reply via email to