On 02/18, Lars Schneider wrote:
>
> On 17 Feb 2016, at 19:58, Matthieu Moy <[email protected]> wrote:
>
> > Lars Schneider <[email protected]> writes:
> >
> >> Coincidentally I started working on similar thing already (1) and I have
> >> lots of ideas around it.
> >
> > I guess it's time to start sharing these ideas then ;-).
> >
> > I think there's a lot to do. If we want to push this idea as a GSoC
> > project, we need:
> >
> > * A rough plan. We can't expect students to read a vague text like
> > "let's make Git safer" and write a real proposal out of it.
> >
> > * A way to start this rough plan incrementally (i.e. first step should
> > be easy and mergeable without waiting for next steps).
> >
> > Feel free to start writting an idea for
> > http://git.github.io/SoC-2016-Ideas/. It'd be nice to have a few more
> > ideas before Friday. We can polish them later if needed.
>
> I published my ideas here:
> https://github.com/git/git.github.io/pull/125/files
Sorry for posting my idea so late, but it took me a while to write
this all up, and life has a habit of getting in the way. My idea goes
into a different direction than yours.
I do like the remote whitelist/blacklist project.
Junio pointed out to me off list that this is to complicated for a
GSoC project. I kind of agree with that, but I wanted to see how this
could be split up, to completely convince myself as well. And indeed,
the more I think about it the more risky it seems.
Below there are some thoughts on a potential design, in case someone
is interested, no code to back any of this up, sorry.
Everything proposed below should be hidden behind some configuration
variable, potentially one per command (?)
- start with git-clean. It's well defined which files are cleaned
from a repository when running the command. Add them to a commit on
the tip of the current branch.
Start a new branch (or use the existing one if applicable) in
refs/restore/history, and add a commit including a notes file. The
commit message contains the operation that was executed (clean in
this case), and the hash of the commit we created which includes the
cleaned files.
Add a note to the commit, detailing from which command we come from,
which files we added (not strictly necessary, as we can infer it
from the parent commit).
Useful in itself as the user can recover the files manually if
needed, and can be sent as separate patch series.
Potential problems: Git has no way to track directories. This can
be mitigated by keeping the list of directories in the attached
note.
- add a git recover command. The command looks at This would look like `git
recover
<commit>`, where commit is the hash of the commit we saved before.
This works by reading the note attached to the commit, figuring out
which command was run before, and restoring the state we were in
before.
Potential problems: conflicts, but I think this can be solved by
simply erroring out, at least in the first iteration.
- the next command could be git mv -f, git reset -f and friends. It
gets more tricky here, as we'll have to deal with the state of the
files in the index.
Analogous to git clean, the changes in the working tree are all
staged and added to a new commit on the tip of the current branch.
The note on this commit needs to contain the necessary data to
rebuild the state in the index. The format is more closely
specified below. We also need the corresponding changes in the
git restore command.
Restored files will be written to disk as racily smudged, so the
contents are checked by git, as we lost the meta-data anyway. This
comes at a slight performance impact, but I think that's okay as we
potentially saved the user a lot of time re-doing all the changes.
- git branch/tag --force. Store the name and the old location of the
branch in refs/restore/history. There are no files lost with this
operation, so no additional commits as for git clean or git reset
etc. are needed. The format of the commit depends on the exact
operation that was forced, for exact format see below.
This treatment can't make all operations safe. Any operation that
touches the remote is hard to undo as some users already might have
fetched the new state of the remote (e.g. git push -f). Others such
as git-gc will inevitably delete information from the disk, but
changing that
There's more, but I don't think just writing up all commands without
any code would make any sense.
Formats:
- commits in refs/restore/history:
empty commits with the following commit message format for git-clean
and git-reset and friends:
$versionnumber\n
$command\n
$branchname\n
$sha1ofreferencedcommit\n
empty commits with the following commit message format for git branch
and friends
$versionnumber\n
$command\n (this includes the exact operation that was forced
(e.g. move, delete etc.)
$branchname\n
$sha1ThatWasReferencedByTheBranch\n
$overwrittenbranchname\n (this and the sha1 below are only used for
--move)
$sha1ReferencedByOverwrittenBranch\n
- notes file: The format can be different for different commands, as
they all have different needs
- git clean:
list of affected files and directories separated by '\0'.
I think we could get away with only the directories, but adding
the filenames as well might make the recovery part simpler.
- git reset, etc.:
the following info is stored for each file that is modified by the
original command.
32-bit signature
32-bit number of index entries
32-bit mode (object type + unix permissions)
160-bit SHA-1
16-bit flags (extra careful here what we want to do with the
assume valid flag)
path name (variable length)
resolve-undo extension (same format as in the index)
Alternatives:
- Have a history for each branch in refs/restore/$branchname.
* Advantages:
Each branch has its own history, which can lead to fewer conflicts
when restoring (e.g. user uses `git reset --hard` on one branch,
switches to another branch works (potentially adds more stuff to
this branch), later goes back to the old branch and discovers `git
reset --hard` was actually the wrong thing to do and would like
the data back.
* Disadvantages:
It is harder for the user to intuitively know what git restore
will do exactly.
It's much more limited when we want to extend it to branch
removals, etc.
- Storing additional information in the refs/restore/history ref
* Advantages:
No need for extra notes
* Disadvantages:
Data doesn't get garbage collected without user interaction,
potentially blowing up the repository size. Especially using `git
clean`, where binary files might be involved.
- Store the whole index in the note
* Advantages:
Simpler way of restoring the index (including all of the
extensions)
* Disadvantages:
Need to take care of both the index and the split index.
Will consume a lot more disk space in the normal case (only a few
of the files in the repository are changed, while the majority
remains unchanged).
- Store the changed files in refs/restore/history instead of a new
commit on the tip of the current branch.
* Advantages:
All the information is in one place.
Data will not be garbage collected.
* Disadvantages:
Data will not be garbage collected. (Repository size is probably
going to blow up after a while)
It takes more effort to find the parent and diff against it.
--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to [email protected]
More majordomo info at http://vger.kernel.org/majordomo-info.html