kevingurney commented on a change in pull request #10932: URL: https://github.com/apache/arrow/pull/10932#discussion_r690534937
########## File path: .github/workflows/matlab.yml ########## @@ -0,0 +1,61 @@ +# Licensed to the Apache Software Foundation (ASF) under one +# or more contributor license agreements. See the NOTICE file +# distributed with this work for additional information +# regarding copyright ownership. The ASF licenses this file +# to you under the Apache License, Version 2.0 (the +# "License"); you may not use this file except in compliance +# with the License. You may obtain a copy of the License at +# +# http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, +# software distributed under the License is distributed on an +# "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY +# KIND, either express or implied. See the License for the +# specific language governing permissions and limitations +# under the License. + +name: MATLAB + +on: + push: + paths: + - '.github/workflows/matlab.yml' + - 'ci/scripts/matlab*.sh' + - 'matlab/**' + - 'cpp/src/arrow/**' + +concurrency: + group: ${{ github.repository }}-${{ github.ref }}-${{ github.workflow }} + cancel-in-progress: true + +jobs: + + matlab: + name: MATLAB + runs-on: ubuntu-latest + steps: + - name: Check out repository + uses: actions/checkout@v2 + with: + fetch-depth: 0 + - name: Fetch Submodules and Tags + shell: bash + run: ci/scripts/util_checkout.sh + - name: Install MATLAB + uses: matlab-actions/setup-matlab@v0 + - name: Build MATLAB Interface Review comment: Thanks for suggesting that we use Ninja! This significantly speeds up local builds and has a noticeable impact on the CI time, as you've pointed out. We investigated a number of other approaches to reducing the CI build time: 1. Using precompiled headers (`-D ARROW_USE_PRECOMPILED_HEADERS=ON`) 2. Using CMake Unity builds (`-D CMAKE_UNITY_BUILD=ON`) 3. Using ccache (`-D ARROW_USE_CCACHE=ON`) 4. Using `CMAKE_BUILD_PARALLEL_LEVEL` 5. Using the gold linker (`-D ARROW_USE_LD_GOLD=ON`) Based on my ad-hoc testing, none of the options listed above appear to reduce the build times in any noticeable way. In fact, builds were slightly slower when using these options, in most cases. Some of these approaches (in particular, using precompiled headers and using Unity/Jumbo builds) seem to be at odds with the parallel build strategies used by Ninja. In addition, Ninja seems to be automatically parallelizing over all available cores by default, so explicitly setting `CMAKE_BUILD_PARALLEL_LEVEL` doesn't seem to do much from my understanding. Using the gold linker had negligible impact from what we observed. Perhaps this is because there isn't a significant enough amount of linking going on here? The `ccache` approach seems like it should theoretically be helpful when running clean builds with a warm cache. However, for some reason, we don't notice any reduction in build time when working with a warm cache (we observed a fairly high cache hit rate using `ccache -s`). It's possible we are misusing `ccache` in some way, but as far as we can tell, it doesn't appear to be giving any major performance gains for this use case. I understand that the C++ CI build does use `ccache`, so if there is something we are missing here in terms of the performance impact, please let us know. ------------------------ Overall, using Ninja as the CMake generator was the most useful change. As a side note - we realize now that we may be able to reduce the build time a bit more by building the Arrow C++ libraries separately. We could then `apt-get install` GoogleTest for running the MATLAB Interface C++ tests. Installing GoogleTest separately using the Ubuntu package repositories should prevent us from having to build and run all of the Arrow C++ tests as part of the MATLAB CI build. We've captured this work in [ARROW-13647](https://issues.apache.org/jira/browse/ARROW-13647). If you have additional suggestions for reducing the CI time, we are happy to investigate them further. Thank you! ########## File path: .github/workflows/matlab.yml ########## @@ -0,0 +1,61 @@ +# Licensed to the Apache Software Foundation (ASF) under one +# or more contributor license agreements. See the NOTICE file +# distributed with this work for additional information +# regarding copyright ownership. The ASF licenses this file +# to you under the Apache License, Version 2.0 (the +# "License"); you may not use this file except in compliance +# with the License. You may obtain a copy of the License at +# +# http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, +# software distributed under the License is distributed on an +# "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY +# KIND, either express or implied. See the License for the +# specific language governing permissions and limitations +# under the License. + +name: MATLAB + +on: + push: + paths: + - '.github/workflows/matlab.yml' + - 'ci/scripts/matlab*.sh' + - 'matlab/**' + - 'cpp/src/arrow/**' + +concurrency: + group: ${{ github.repository }}-${{ github.ref }}-${{ github.workflow }} + cancel-in-progress: true + +jobs: + + matlab: + name: MATLAB + runs-on: ubuntu-latest + steps: + - name: Check out repository + uses: actions/checkout@v2 + with: + fetch-depth: 0 + - name: Fetch Submodules and Tags + shell: bash + run: ci/scripts/util_checkout.sh + - name: Install MATLAB + uses: matlab-actions/setup-matlab@v0 + - name: Build MATLAB Interface Review comment: My apologies! I just realized I made a mistake when comparing the performance of the MATLAB CI build with and without `ccache`. I thought by simply omitting `-DARROW_USE_CCACHE=ON` that `ccache` wouldn't be used. However, I just realized that the default behavior is to use `ccache`, regardless of whether the flag is specified. After comparing the build times again (this time explicitly setting `-DARROW_USE_CCACHE=OFF`), it seems like using `ccache` does give some fairly substantial performance gains when doing a clean build with a warm cache. Sorry again for the confusion! We'll look into properly integrating `ccache` in GitHub Actions by following the approach used by the other language binding [`.github/workflows/*.yml`](https://github.com/apache/arrow/blob/820e5061847c9d6d261c416e57d6013321175565/.github/workflows/cpp.yml#L301) files. ########## File path: .github/workflows/matlab.yml ########## @@ -0,0 +1,61 @@ +# Licensed to the Apache Software Foundation (ASF) under one +# or more contributor license agreements. See the NOTICE file +# distributed with this work for additional information +# regarding copyright ownership. The ASF licenses this file +# to you under the Apache License, Version 2.0 (the +# "License"); you may not use this file except in compliance +# with the License. You may obtain a copy of the License at +# +# http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, +# software distributed under the License is distributed on an +# "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY +# KIND, either express or implied. See the License for the +# specific language governing permissions and limitations +# under the License. + +name: MATLAB + +on: + push: + paths: + - '.github/workflows/matlab.yml' + - 'ci/scripts/matlab*.sh' + - 'matlab/**' + - 'cpp/src/arrow/**' + +concurrency: + group: ${{ github.repository }}-${{ github.ref }}-${{ github.workflow }} + cancel-in-progress: true + +jobs: + + matlab: + name: MATLAB + runs-on: ubuntu-latest + steps: + - name: Check out repository + uses: actions/checkout@v2 + with: + fetch-depth: 0 + - name: Fetch Submodules and Tags + shell: bash + run: ci/scripts/util_checkout.sh + - name: Install MATLAB + uses: matlab-actions/setup-matlab@v0 + - name: Build MATLAB Interface Review comment: In the spirit of incremental delivery, we think it makes sense to work on enabling `ccache` support in a follow-up pull request. We captured this work in: https://issues.apache.org/jira/browse/ARROW-13658. Thank you! -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected]
