NightOwl888 commented on PR #1172: URL: https://github.com/apache/lucenenet/pull/1172#issuecomment-3263289030
I have worked out how to successfully cache NuGet packages. But after some research, I discovered that there are 2 caching goals that are typically employed (sometimes both at the same time): ### Restore caching Restore caching backs up and restores the local NuGet cache to the global GitHub Actions cache. This is the most impactful cache option for small to medium sized projects, since the time it takes to build is far dwarfed by the time it takes to restore. #### Cache key The cache key only needs to be based on a hash of every project's configuration of `<PackageRefernce>` elements and `<PackageVersion>` elements, and the current OS. ```bash # '**/*.*proj' includes .csproj, .vbproj, .fsproj, msbuildproj, etc. # '**/*.props' includes Directory.Packages.props, Directory.Build.props, and Dependencies.props # '**/*.targets' includes Directory.Build.targets # '**/*.sln' and '*.sln' ensure root solution files are included (minimatch glitch for file extension .sln) # 'global.json' included for SDK version changes key: nuget-v1-${{ runner.os }}-${{ hashFiles('**/*.*proj', '**/*.props', '**/*.targets', '**/*.sln', '*.sln', 'global.json') }} ``` #### Restore keys There is also a setting for `restore-keys`. This is meant for npm, pip, or other such package mangers where a partial hit is still valid. This approach does not usually work for NuGet caches, so the setting should be omitted. ### Incremental build caching Incremental build caching backs up and restores the local `obj` folders (build output) to the global GitHub Actions cache. This may be impactful for large projects, particularly those with many build jobs and common dependencies, but typically for small projects with only a single build it isn't worth the effort. This strategy takes advantage of the fact that the GitHub Actions cache is **shared** between build agents. #### Cache key The cache key is typically based on a hash of every project's `packages.lock.json` files, project files, `global.json` file, the OS, target framework, and architecture (x64, arm64, etc). ```bash key: build-${ runner.os }-${ matrix.tfm }-${ matrix.arch }-${ hashFiles('**/packages.lock.json', '**/*.*proj', 'global.json') } ``` For very large repositories, combining the project file hashes + `packages.lock.json` is the safest option. #### Restore keys In this case restore keys can make a difference when most of the projects don't change and only a few of them do. So, having a a fallback could improve performance a bit if there is a cache miss. #### Lock files The `packages.lock.json` files only make sense to have on the CI server for caching key purposes and there is a `dotnet restore --use-lock-file` command to create them on demand. There is no requirement to set the `RestorePackagesWithLockFile` setting or commit the lock files to the repository. So, either of these 2 options is non-intrusive: 1. Leave `RestorePackagesWithLockFile` unset and use `dotnet restore --use-lock-file` in CI 2. Set `RestorePackagesWithLockFile` and add `packages.lock.json` files to `.gitignore` > Technically, the `RestorePackagesWithLockFile` setting must be set for `dotnet` commands run after `dotnet restore` to honor the lock files. However, for caching purposes this doesn' matter because all we care about is key consistency. In this scenario, the lock files are **cache metadata**, not for dependency pinning. --------------------------- After looking at the configuration in this PR, it seems that it is mixing apples and oranges. ```yml - name: Cache NuGet Packages uses: actions/cache@v3 id: lucene-nuget-cache with: path: | ~/.nuget/packages key: `${{ runner.os }}-nuget-`${{ hashFiles('**/packages.lock.json') }} restore-keys: | `${{ runner.os }}-nuget-lucene - name: Restore dependencies run: dotnet restore ``` There are a few different issues with this approach. 1. The path should not be hard coded to `~/.nuget/packages` since the canonical way to retrieve the path is the `NUGET_PACKAGES` environment variable, which is used by the `dotnet restore` and other commands. 2. Using the hash of '**/packages.lock.json' without including target framework and architecture in the cache key is problematic. But even if that were straightened out so the cache is correctly busted, there is no reason to store NuGet caches based off of more than just OS. This arrangement will create extra unnecessary duplicate caches. The bug here is that there is a mismatch between what is being cached and how it is being keyed. 4. The restore key `${{ runner.os }}-nuget-lucene` is effectively the same thing as having no key, because it will never match a prefix of the pattern `${{ runner.os }}-nuget-${{ hashFiles('**/packages.lock.json') }}`. For the fallback to work, it must be a prefix (or created like this elsewhere in GitHub Actions). ------------------------- ### Proposal I propose that we change this PR to use **restore caching** only, because that will certainly have a big impact. From that point, we can decide whether it will be worth the effort to enable incremental build caching (separate PR). My best guess is that it will because we have so many parallel builds that share dependencies. I would be happy to fix the PR if you agree. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: dev-unsubscr...@lucenenet.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org