Branko Čibej <[email protected]> writes:

> I don't understand why we need two of these three checksums.
> We have the working stream and the retranslated stream – why not just do
> a byte-for-byte comparison between them instead of burning CPU cycles by
> computing checksums on exactly the same sets of data?

The problem here is that these are not independent streams, as both are
created around the same working file stream.  Consequently, they cannot be
compared using our standard means, such as svn_stream_contents_same2(),
because reading from one stream simultaneously advances the other.

While it might be possible to reduce this to a single checksum and a bytewise
comparison, I think that it wouldn't be simple, and would require us to:

- Set up the detranslated checksumming push stream.
- Set up the retranslated push stream.
- Somehow arrange the retranslated push stream to error out on mismatching
  data, perhaps by wrapping it into an additional proxy stream.
- Read the working stream block-by-block, pushing data to both streams
  (or svn_stream_copy3() it into a "tee" result of two push streams).
- If we didn't get a mismatch from the retranslated stream, verify the
  checksum from the detranslated stream against the pristine checksum.

I would guess that originally I preferred the three-checksum approach
for its relative simplicity, also considering the fact that from a practical
perspective this only affects a fairly niche case (reverting files that need
translation and that passed through all preliminary checks).


Thanks,
Evgeny Kotkov

Reply via email to