Glad to see you tackling this. This is definitely a step in the right direction.

I realize that it will take a lot of work and that intermediate steps may just be pushing it the global state one level higher but eventually it would be great to see an entire code path global state free!

I'm personally interested because reducing the reliance on global state also helps us in our performance work as it makes it more possible to use threading to scale up the performance.

Ben

On 5/18/2017 7:21 PM, Brandon Williams wrote:
When I first started working on the git project I found it very difficult to
understand parts of the code base because of the inherently global nature of
our code.  It also made working on submodules very difficult.  Since we can
only open up a single repository per process, you need to launch a child
process in order to process a submodule.  But you also need to be able to
communicate other stateful information to the children processes so that the
submodules know how best to format their output or match against a
pathspec...it ends up feeling like layering on hack after hack.  What I would
really like to do, is to have the ability to have a repository object so that I
can open a submodule in-process.

Before this becomes a reality for all commands, much of the library code would
need to be refactored in order to work purely on handles instead of global
state.  As it turned out, ls-files is a pretty simple command and doesn't have
*too* many dependencies.  The biggest thing that needed to be changed was
piping through an index into a couple library routines so that they don't
inherently rely on 'the_index'.  A few of these changes I've sent out and can
be found at 'origin/bw/pathspec-sans-the-index' and
'origin/bw/dir-c-stops-relying-on-the-index' which this series is based on.

Patches 1-16 are refactorings to prepare either library code or ls-files itself
to be ready to handle passing around an index struct.  Patches 17-22 introduce
a repository struct and change a couple of things about how submodule caches
work (getting submodule information from .gitmodules).  And Patch 23 converts
ls-files to use a repository struct.

The most interesting part of the series is from 17-23.  And 1-16 could be taken
as is without the rest of the series.

This is still very much in a WIP state, though it does pass all tests.  What
I'm hoping for here is to get a discussion started about the feasibility of a
change like this and hopefully to get the ball rolling.  Is this a direction we
want to move in?  Is it worth the pain?

Thanks for taking the time to look at this and entertain my insane ideas :)

Brandon Williams (23):
  convert: convert get_cached_convert_stats_ascii to take an index
  convert: convert crlf_to_git to take an index
  convert: convert convert_to_git_filter_fd to take an index
  convert: convert convert_to_git to take an index
  convert: convert renormalize_buffer to take an index
  tree: convert read_tree to take an index parameter
  ls-files: convert overlay_tree_on_cache to take an index
  ls-files: convert write_eolinfo to take an index
  ls-files: convert show_killed_files to take an index
  ls-files: convert show_other_files to take an index
  ls-files: convert show_ru_info to take an index
  ls-files: convert ce_excluded to take an index
  ls-files: convert prune_cache to take an index
  ls-files: convert show_files to take an index
  ls-files: factor out debug info into a function
  ls-files: factor out tag calculation
  repo: introduce new repository object
  repo: add index_state to struct repo
  repo: add per repo config
  submodule-config: refactor to allow for multiple submodule_cache's
  repo: add repo_read_gitmodules
  submodule: add is_submodule_active
  ls-files: use repository object

 Makefile                               |   1 +
 apply.c                                |   2 +-
 builtin/blame.c                        |   2 +-
 builtin/commit.c                       |   3 +-
 builtin/ls-files.c                     | 348 ++++++++++++++++-----------------
 cache.h                                |   4 +-
 combine-diff.c                         |   2 +-
 config.c                               |   2 +-
 convert.c                              |  31 +--
 convert.h                              |  19 +-
 diff.c                                 |   6 +-
 dir.c                                  |   2 +-
 git.c                                  |   2 +-
 ll-merge.c                             |   2 +-
 merge-recursive.c                      |   4 +-
 repo.c                                 | 112 +++++++++++
 repo.h                                 |  22 +++
 sha1_file.c                            |   6 +-
 submodule-config.c                     |  40 +++-
 submodule-config.h                     |  10 +
 submodule.c                            |  51 +++++
 submodule.h                            |   2 +
 t/t3007-ls-files-recurse-submodules.sh |  39 ++++
 tree.c                                 |  28 ++-
 tree.h                                 |   3 +-
 25 files changed, 513 insertions(+), 230 deletions(-)
 create mode 100644 repo.c
 create mode 100644 repo.h

Reply via email to