On 22.11.2017 22:08, Evgeny Kotkov wrote:
Stefan Sperling <s...@apache.org> writes:

However, if rep-sharing is enabled, svn_fs_props_changed() does not work
as advertised because properties do not carry a SHA1 checksum with a
"uniquifier" which identifies the transaction they were created in.
The uniquifier is used by svn_fs_props_changed() to tell apart property
representations which share content but were created in different revisions.

To fix that problem, make FSFS write SHA1 checksums along with uniquifiers
for file properties, just as it is already done for file content.

A source code comment indicates that SHA1 checksums for property reps
were not written due to concerns over disk space. In hindsight, this was
a bad trade-off because it affected correctness of svn_fs_props_changed().
Thinking about this change, it could be that writing the additional 40-byte
SHA1 for the property representations is going to eliminate the benefit of
sharing them in the first place.
Here some more background after digging through the commit history.

The original comment about space saving was ill-worded. It should have
said something like "storing sha1 etc. is pointless because nobody uses
it for props & dirs". Hence, it is a waste of space.

In 1.8, svn_fs_fs__dag_things_different got broken for props due to the
missing uniquifier but that went unnoticed. This has not been a deliberate
trade-off but a mere oversight of the potential implication of shared reps.
It needs to be fixed.

One reason for rep sharing is that rep changes tend to be rare and often
not "in sync" with text changes. That means, they must be pulled from
a different repo location than both, the node tree (dirs and noderevs)
and the file contents. Eliminating most of these accesses by reading
only a few shared proplists should give some speedup.

Space savings on disk are not the main focus. Those might happen for
commits with mass prop changes and a longer list of standard-props
at each node. Savings in the membuffer cache may be more significant,
though, because its entry directory has a limited size and storage may
remain unused if the average cached item size drops below 1k.
If I recall correctly, rep sharing for properties is mostly there to gain
from deduplicating small properties, such as svn:eol-style or svn:keywords.
But with the additional SHA1 written in every representation string, this
overhead is likely to take more space than the property itself.

An alternative approach that might be worth considering here would be:

  (1) Extend the on-disk format and allow representation strings without
      SHA1, but with the uniquifier, something like this (where "-" stands
      for "no SHA1"):

      15 0 563 7809 28ef320a82e7bd11eebdf3502d69e608 - 14-g/_5

      (The new format would be allowed starting from FSFS 8.)
Yes, that would be one option. If you are willing to provide a patch,
I would probably +1 it. Not bothering with the space savings would
be a valid choice as well, IMO.
  (2) Use the new format to allow rep sharing for properties that writes
      the uniquifier so that svn_fs_props_changed() would work correctly,
      and doesn't introduce the overhead of writing SHA1 in the representation
      string for every property.

  (3) Disable rep sharing for properties in FSFS formats < 8 that cannot
      read and write such representation strings without SHA1, but with an
      uniquifier.
I'd rather have the pre fsfs-8 repos use the full sha1+uniquifier.
That is not worse than storing the props themselves and will
preserve the benefits mentioned above.
Barring objections and alternative suggestions, I could give a shot at
implementing this.
Go for it. Maybe notify me once you are done b/c currently, I don't
monitor the commit activity closely.

-- Stefan^2.

Reply via email to