On Mon, Jul 7, 2014 at 5:54 PM, C. Michael Pilato <cmpil...@collab.net> wrote: > On 07/07/2014 11:23 AM, Branko Čibej wrote: >> On 07.07.2014 17:07, C. Michael Pilato wrote: >>> On 07/07/2014 10:58 AM, Ivan Zhakov wrote: >>>> My technical opinion that FSFS7/log addressing is slower by design, >>>> because it's doing more (read index, then read data instead of just >>>> read data) and only caching makes them comparable on performance to >>>> FSFS6 repositories. >>> I'm coming into this kinda late and after two weeks of vacation, so >>> please forgive me if I misunderstand the above, but is it true that >>> FSFS7 requires some kind of non-trivial caching just to match FSFS6's >>> performance? >> >> Yup.
Nope. F7 is all about I/O reduction. No I/O, no reduction. The savings are significant and a factor of 2 is typical. Even SSDs see speedups. Data size and read operations (when to read what noderev / rep / ..) are roughly unchanged. Thus, if caches are hot, the extra addressing overhead cost you something between 0 (hot SVN caches) and 10% CPU (hot OS caches only). F7 adds another feature that had to be made opt-in: "block-read". Instead of reading only a few 100 bytes, it makes SVN parse the whole 64k block that the OS provides anyway and puts the data into cache. In environments with slow fopen(), that should save CPU, but it requires significant SVN caches to be able to eventually use the prefetched data. I'm particularly keen to see how much of an impact that makes on Windows. > May I then presume that for folks who have many repositories being > hosted from a single server, FSFS7 will necessarily bring either a CPU > performance hit (insufficient cache) or a RAM requirement/consumption > hit (sufficient, ginormous cache)? Or is the cache configuration > perhaps per-server rather than per-repository? I just started the Windows tests in a "realistic" environment with a 4GB RAM server managing > 50GB of repository data. The goal is clearly that the default config is not slower than before and the computational overhead is roughly the same as we added when introducing manifest files for packed repos. The problem with all that measurement is that it is very hard to create an environment that behaves roughly as the real world would (multiple repositories being created over a long period of time, interleaving each other). It took me a whole day to rewrite the copy script that creates meaningful data sets for systems whose operation cannot be controlled (SAN). -- Stefan^2.