On Thu, Jan 15, 2026 at 9:16 AM Evgeny Kotkov via dev <
[email protected]> wrote:

> Branko Čibej <[email protected]> writes:
>
> > Didn't we have file size and modification time as additional checks if
> > a full-text compare was needed? The size is recorded in the wc-db and
> > should be even if the pristine file is absent, but the mtime is not,
> IIRC.
> > In any case, the more checks we use, the harder it is to construct a
> > collision.
>
> Yes, we begin by comparing file sizes and modification times against the
> values stored in wc.db.  This logic is identical for both pristineful and
> pristineless working copies.
>
> It gets slightly trickier with eol/keyword translation, but if no
> translation
> is needed, I think it boils down to this:
>
> - If both the sizes and timestamps match, the file is considered
> unmodified.
> - If the sizes differ, the file is considered modified.
>
> However, there are still cases where these quick checks are inconclusive.
> For example, if a file is modified but retains the same size, or if the
> on-disk timestamps have somehow changed.  In those cases, we fall back
> to a content comparison via questions.c:compare_and_verify():
>
> - In trunk, compare_and_verify() does not distinguish between pristineful
>   and pristineless working copies and always performs a checksum-based
>   comparison (for instance, because the pristine content is unavailable
>   in the pristineless case).
>
> - In 1.14, compare_and_verify() always performs a content comparison
>   between the pristine and the working file.



Thanks for explaining this. (This clears up some questions I was going
to try to answer by researching the history.)

Since the checksum-based check is new in trunk (and 1.15)...


I'm currently thinking that we could make compare_and_verify() perform a
> content comparison for pristineful working copies, to avoid changing more
> characteristics than necessary.  So my plan was to sketch a patch to see
> how this translates into code.
>


...I am inclined to agree with this plan.

In other words, if the plan comes to fruition, the behavior of
compare_and_verify() would remain unchanged since 1.14.x, unless the
working copy is pristineless.

One more thought:

In pristineless working copies, some pristines are available some of
the time, such as when they have been fetched for any reason by an
earlier operation. (In the current implementation, these may have been
fetched for no other reason than because they share a common subtree
with several modified files.) In this case, the content comparison
could be performed, rather than the checksum comparison. The decision
(whether to perform a content or checksum comparison) could be based
on whether the pristine in question is available at this time, rather
than on the pristineness of the working copy as a whole.

Pros:

- performs the "best" comparison possible with the available
  information (if we consider a content comparison to be "better" or
  "more definitive" than a checksum comparison)

- future effort to allow more granular user control over pristines
  (rather than the all-or-nothing approach in 1.15.x) could benefit
  from such logic. Specifically, if a working copy is partially-
  pristined, I think we would want the content comparison performed
  for pristined files.

- content comparison might be more performant than checksum
  comparison, due to short-circuit evaluation when the first
  difference is encountered; no such shortcut is possible with
  checksum calculation.

Cons:

- inconsistency: status checks of a file may behave differently at
  different times, since the pristine may be available during some
  invocations and unavailable in others.

Thoughts?

Cheers,
Nathan

Reply via email to