Nguyễn Thái Ngọc Duy <[email protected]> writes:
> The use case is
>
> tar -xzf bigproject.tar.gz
> cd bigproject
> git init
> git add .
> # git grep or something
Two obvious thoughts, and a half.
(1) This particular invocation of "git add" can easily detect that
it is run in a repository with no $GIT_INDEX_FILE yet, which is
the most typical case for a big initial import. It could even
ask if the current branch is unborn if you wanted to make the
heuristic more specific to this use case. Perhaps it would
make sense to automatically plug the bulk import machinery in
such a case without an option?
(2) Imagine performing a dry-run of update_files_in_cache() using a
different diff-files callback that is similar to the
update_callback() but that uses the lstat(2) data to see how
big an import this really is, instead of calling
add_file_to_index(), before actually registering the data to
the object database. If you benchmark to see how expensive it
is, you may find that such a scheme might be a workable
auto-tuning mechanism to trigger this. Even if it were
moderately expensive, when combined with the heuristics above
for (1), it might be a worthwhile thing to do only when it is
likely to be an initial import.
(3) Is it always a good idea to send everything to a packfile on a
large addition, or are you often better off importing the
initial fileset as loose objects? If the latter, then the
option name "--bulk" may give users a wrong hint "if you are
doing a bulk-import, you are bettern off using this option".
This is a very logical extension to what was started at 568508e7
(bulk-checkin: replace fast-import based implementation,
2011-10-28), and I like it. I suspect "--bulk=<threashold>" might
be a better alternative than setting the threshold unconditionally
to zero, though.
--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to [email protected]
More majordomo info at http://vger.kernel.org/majordomo-info.html