NightOwl888 commented on PR #1172:
URL: https://github.com/apache/lucenenet/pull/1172#issuecomment-3263289030

   I have worked out how to successfully cache NuGet packages.
   
   But after some research, I discovered that there are 2 caching goals that 
are typically employed (sometimes both at the same time):
   
   ### Restore caching
   
   Restore caching backs up and restores the local NuGet cache to the global 
GitHub Actions cache. This is the most impactful cache option for small to 
medium sized projects, since the time it takes to build is far dwarfed by the 
time it takes to restore.
   
   #### Cache key
   
   The cache key only needs to be based on a hash of every project's 
configuration of `<PackageRefernce>` elements and `<PackageVersion>` elements, 
and the current OS.
   
   ```bash
   # '**/*.*proj' includes .csproj, .vbproj, .fsproj, msbuildproj, etc.
   # '**/*.props' includes Directory.Packages.props, Directory.Build.props, and 
Dependencies.props
   # '**/*.targets' includes Directory.Build.targets
   # '**/*.sln' and '*.sln' ensure root solution files are included (minimatch 
glitch for file extension .sln)
   # 'global.json' included for SDK version changes
   key: nuget-v1-${{ runner.os }}-${{ hashFiles('**/*.*proj', '**/*.props', 
'**/*.targets', '**/*.sln', '*.sln', 'global.json') }}
   ```
   
   #### Restore keys
   
   There is also a setting for `restore-keys`. This is meant for npm, pip, or 
other such package mangers where a partial hit is still valid. This approach 
does not usually work for NuGet caches, so the setting should be omitted. 
   
   ### Incremental build caching
   
   Incremental build caching backs up and restores the local `obj` folders 
(build output) to the global GitHub Actions cache. This may be impactful for 
large projects, particularly those with many build jobs and common 
dependencies, but typically for small projects with only a single build it 
isn't worth the effort. This strategy takes advantage of the fact that the 
GitHub Actions cache is **shared** between build agents.
   
   #### Cache key
   
   The cache key is typically based on a hash of every project's 
`packages.lock.json` files, project files, `global.json` file, the OS, target 
framework, and architecture (x64, arm64, etc).
   
   ```bash
   key: build-${ runner.os }-${ matrix.tfm }-${ matrix.arch }-${ 
hashFiles('**/packages.lock.json', '**/*.*proj', 'global.json') }
   ```
   
   For very large repositories, combining the project file hashes + 
`packages.lock.json` is the safest option.
   
   #### Restore keys
   
   In this case restore keys can make a difference when most of the projects 
don't change and only a few of them do. So, having a a fallback could improve 
performance a bit if there is a cache miss.
   
   #### Lock files
   
   The `packages.lock.json` files only make sense to have on the CI server for 
caching key purposes and there is a `dotnet restore --use-lock-file` command to 
create them on demand. There is no requirement to set the 
`RestorePackagesWithLockFile` setting or commit the lock files to the 
repository. So, either of these 2 options is non-intrusive:
   
   1. Leave `RestorePackagesWithLockFile` unset and use `dotnet restore 
--use-lock-file` in CI
   2. Set `RestorePackagesWithLockFile` and add `packages.lock.json` files to 
`.gitignore`
   
   > Technically, the `RestorePackagesWithLockFile` setting must be set for 
`dotnet` commands run after `dotnet restore` to honor the lock files. However, 
for caching purposes this doesn' matter because all we care about is key 
consistency. In this scenario, the lock files are **cache metadata**, not for 
dependency pinning.
   
   ---------------------------
   
   After looking at the configuration in this PR, it seems that it is mixing 
apples and oranges.
   
   ```yml
         - name: Cache NuGet Packages
           uses: actions/cache@v3
           id: lucene-nuget-cache
           with:
             path: |
               ~/.nuget/packages
             key: `${{ runner.os }}-nuget-`${{ 
hashFiles('**/packages.lock.json') }}
             restore-keys: |
               `${{ runner.os }}-nuget-lucene
         - name: Restore dependencies
           run: dotnet restore
   ```
   
   There are a few different issues with this approach.
   
   1. The path should not be hard coded to `~/.nuget/packages` since the 
canonical way to retrieve the path is the `NUGET_PACKAGES` environment 
variable, which is used by the `dotnet restore` and other commands.
   2. Using the hash of '**/packages.lock.json' without including target 
framework and architecture in the cache key is problematic. But even if that 
were straightened out so the cache is correctly busted, there is no reason to 
store NuGet caches based off of more than just OS. This arrangement will create 
extra unnecessary duplicate caches. The bug here is that there is a mismatch 
between what is being cached and how it is being keyed.
   4. The restore key `${{ runner.os }}-nuget-lucene` is effectively the same 
thing as having no key, because it will never match a prefix of the pattern 
`${{ runner.os }}-nuget-${{ hashFiles('**/packages.lock.json') }}`. For the 
fallback to work, it must be a prefix (or created like this elsewhere in GitHub 
Actions).
   
   -------------------------
   
   ### Proposal
   
   I propose that we change this PR to use **restore caching** only, because 
that will certainly have a big impact. From that point, we can decide whether 
it will be worth the effort to enable incremental build caching (separate PR). 
My best guess is that it will because we have so many parallel builds that 
share dependencies.
   
   I would be happy to fix the PR if you agree.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: dev-unsubscr...@lucenenet.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org

Reply via email to