On 20. 1. 26 00:16, Evgeny Kotkov via dev wrote:
Evgeny Kotkov<[email protected]> writes:
With a fresh look, I think that in the current state we might want to
indeed have the full content comparison for pristineful working copies,
and only use checksum-based comparison for pristineless working copies
(as described in your response).
I'll see if I can put together a patch for this approach.
Please find the patch attached.
With the additional information I gained from the code, I realized that
while it may be possible to rely on the global WC state, it also introduces
a potential race condition and a non-transactional state dependency, where
the global settings could conflict with the state of the pristine we are
accessing transactionally.
The refined approach makes a decision based on the current state of an
individual pristine (which technically appears to be the correct source
of truth for this layer of operations), and uses bytewise comparison if
the pristine content is available.
If there are no objections, I could commit the patch shortly.
I have no objections at all, the patch looks good.
But I do have one question that's only somewhat related to the patch
itself. In the new, refactored function compare_exact(), there's this
explanation:
/* We don't have pristine contents. To make the comparison work without
it, let's check for two things:
1) That the checksum of the detranslated contents matches the recorded
pristine checksum, as in the case of a non-exact comparison, ...
2) ...and additionally, that the contents of the working file does not
change when retranslated according to its properties.
Technically we're going to do that with a single read of the file
contents, while checksumming it's original, detranslated and
retranslated versions.
*/
The code then proceeds to compute three checksums: of the original
working copy contents, the untranslated contents with we presume would
be the pristine text and of the retranslated contents.
I don't understand why we need two of these three checksums. We have the
working stream and the retranslated stream – why not just do a
byte-for-byte comparison between them instead of burning CPU cycles by
computing checksums on exactly the same sets of data?
Not only is comparing the data much faster than computing its checksum;
but if the original and retranslated streams are different, the
comparison would stop early without having to read the whole file.
-- Brane