On 20.02.2011 21:02, Johan Corveleyn wrote:
On Sun, Feb 20, 2011 at 6:35 PM, Mark Mielke<m...@mark.mielke.cc>  wrote:

That said, I'm also (in principle) against implementing cache of open file
handles. I prefer architectures that cache intermediate data in a processed
form that the application has made a determined choice to make use of such
that the cache is the most useful to the application, rather than a
transparent caching layer that guesses at what is safe. The OS file system
layer is exactly this - any caching it does is transparent to the
application and a guess. Guesses are dangerous, which is exactly why the OS
file system layer cannot do as much caching unless it has 100% control of
the file system (= local file system).
Agreed. For that very reason, I added extensive
caching to the FSFS code and got even more of that
in the pipeline for 1.8.

That being said, there are still typical situations in
which the data cache may not be effective:

* access to relatively rarely read data
  (log, older tags;
   you still want to perform decently in that case)
* first access to the latest revision
  (due to the way transactions are implemented,
   it is difficult to fill all the caches upon write)
* amount of active data > available RAM
  (throws you back to the first issue more often)

I agree that it would be best if the architecture was so that svn
could organize its work for most use cases in a way that's efficient
for the lower levels of the system. For instance, for "svn log", svn
should in theory be able to do its work with exactly 1 open/close per
rev file (or in a packed repository, maybe even only 1 open/close per
packed file).
Yes, it may be very hard to anticipate what data may
be needed further down the road, even if we had a
marvelous "1 query gets it all" interface where feasible:
svn log, for instance, is often run with a limit on the number
of results. However, there is no way to tell how much of
a packed file needs to be read to process that query.
There is only a lower bound.

So, it can be very beneficial to keep a small number of
file handles around to "bridge" various stages / iterations
within a single request.
But right now, this isn't the case, and I think it would be a huge
amount of work, change in architecture, layering, ... Until that
happens, I think such a generic file-handle caching layer could prove
very helpful :-). Note though that, if I understood correctly, the
file-handle caching of the performance branch will not be reintegrated
into 1.7, but maybe 1.8 ...

But maybe stefan2 can comment more on that :-).
Because keeping file open for a potentially much
longer period of time may have an impact on other,
rarely run operations like pack, I don't think we should
risk merging this into 1.7.

-- Stefan^2.

Reply via email to