kevingurney commented on a change in pull request #10932:
URL: https://github.com/apache/arrow/pull/10932#discussion_r690534937



##########
File path: .github/workflows/matlab.yml
##########
@@ -0,0 +1,61 @@
+# Licensed to the Apache Software Foundation (ASF) under one
+# or more contributor license agreements.  See the NOTICE file
+# distributed with this work for additional information
+# regarding copyright ownership.  The ASF licenses this file
+# to you under the Apache License, Version 2.0 (the
+# "License"); you may not use this file except in compliance
+# with the License.  You may obtain a copy of the License at
+#
+#   http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing,
+# software distributed under the License is distributed on an
+# "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+# KIND, either express or implied.  See the License for the
+# specific language governing permissions and limitations
+# under the License.
+
+name: MATLAB
+
+on:
+  push:
+    paths:
+      - '.github/workflows/matlab.yml'
+      - 'ci/scripts/matlab*.sh'
+      - 'matlab/**'
+      - 'cpp/src/arrow/**'
+
+concurrency:
+  group: ${{ github.repository }}-${{ github.ref }}-${{ github.workflow }}
+  cancel-in-progress: true
+
+jobs:
+
+  matlab:
+    name: MATLAB 
+    runs-on: ubuntu-latest
+    steps:
+      - name: Check out repository
+        uses: actions/checkout@v2
+        with:
+          fetch-depth: 0
+      - name: Fetch Submodules and Tags
+        shell: bash
+        run: ci/scripts/util_checkout.sh
+      - name: Install MATLAB
+        uses: matlab-actions/setup-matlab@v0
+      - name: Build MATLAB Interface

Review comment:
       Thanks for suggesting that we use Ninja! This significantly speeds up 
local builds and has a noticeable impact on the CI time, as you've pointed out.
   
   We investigated a number of other approaches to reducing the CI build time:
   
   1. Using precompiled headers (`-D ARROW_USE_PRECOMPILED_HEADERS=ON`)
   2. Using CMake Unity builds (`-D CMAKE_UNITY_BUILD=ON`)
   3. Using ccache (`-D ARROW_USE_CCACHE=ON`)
   4. Using `CMAKE_BUILD_PARALLEL_LEVEL`
   5. Using the gold linker (`-D ARROW_USE_LD_GOLD=ON`)
   
   Based on my ad-hoc testing, none of the options listed above appear to 
reduce the build times in any noticeable way. In fact, builds were slightly 
slower when using these options, in most cases.
   
   Some of these approaches (in particular, using precompiled headers and using 
Unity/Jumbo builds) seem to be at odds with the parallel build strategies used 
by Ninja. In addition, Ninja seems to be automatically parallelizing over all 
available cores by default, so explicitly setting `CMAKE_BUILD_PARALLEL_LEVEL` 
doesn't seem to do much from my understanding. Using the gold linker had 
negligible impact from what we observed. Perhaps this is because there isn't a 
significant enough amount of linking going on here?
   
   The `ccache` approach seems like it should theoretically be helpful when 
running clean builds with a warm cache. However, for some reason, we don't 
notice any reduction in build time when working with a warm cache (we observed 
a fairly high cache hit rate using `ccache -s`). It's possible we are misusing 
`ccache` in some way, but as far as we can tell, it doesn't appear to be giving 
any major performance gains for this use case.
   
   I understand that the C++ CI build does use `ccache`, so if there is 
something we are missing here in terms of the performance impact, please let us 
know.
   
   ------------------------
   
   Overall, using Ninja as the CMake generator was the most useful change.
   
   As a side note - we realize now that we may be able to reduce the build time 
a bit more by building the Arrow C++ libraries separately. We could then 
`apt-get install` GoogleTest for running the MATLAB Interface C++ tests. 
Installing GoogleTest separately using the Ubuntu package repositories should 
prevent us from having to build and run all of the Arrow C++ tests as part of 
the MATLAB CI build. We've captured this work in 
[ARROW-13647](https://issues.apache.org/jira/browse/ARROW-13647).
   
   If you have additional suggestions for reducing the CI time, we are happy to 
investigate them further. Thank you!

##########
File path: .github/workflows/matlab.yml
##########
@@ -0,0 +1,61 @@
+# Licensed to the Apache Software Foundation (ASF) under one
+# or more contributor license agreements.  See the NOTICE file
+# distributed with this work for additional information
+# regarding copyright ownership.  The ASF licenses this file
+# to you under the Apache License, Version 2.0 (the
+# "License"); you may not use this file except in compliance
+# with the License.  You may obtain a copy of the License at
+#
+#   http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing,
+# software distributed under the License is distributed on an
+# "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+# KIND, either express or implied.  See the License for the
+# specific language governing permissions and limitations
+# under the License.
+
+name: MATLAB
+
+on:
+  push:
+    paths:
+      - '.github/workflows/matlab.yml'
+      - 'ci/scripts/matlab*.sh'
+      - 'matlab/**'
+      - 'cpp/src/arrow/**'
+
+concurrency:
+  group: ${{ github.repository }}-${{ github.ref }}-${{ github.workflow }}
+  cancel-in-progress: true
+
+jobs:
+
+  matlab:
+    name: MATLAB 
+    runs-on: ubuntu-latest
+    steps:
+      - name: Check out repository
+        uses: actions/checkout@v2
+        with:
+          fetch-depth: 0
+      - name: Fetch Submodules and Tags
+        shell: bash
+        run: ci/scripts/util_checkout.sh
+      - name: Install MATLAB
+        uses: matlab-actions/setup-matlab@v0
+      - name: Build MATLAB Interface

Review comment:
       My apologies!
   
   I just realized I made a mistake when comparing the performance of the 
MATLAB CI build with and without `ccache`. I thought by simply omitting 
`-DARROW_USE_CCACHE=ON` that `ccache` wouldn't be used. However, I just 
realized that the default behavior is to use `ccache`, regardless of whether 
the flag is specified.
   
   After comparing the build times again (this time explicitly setting 
`-DARROW_USE_CCACHE=OFF`), it seems like using `ccache` does give some fairly 
substantial performance gains when doing a clean build with a warm cache.
   
   Sorry again for the confusion!
   
   We'll look into properly integrating `ccache` in GitHub Actions by following 
the approach used by the other language binding 
[`.github/workflows/*.yml`](https://github.com/apache/arrow/blob/820e5061847c9d6d261c416e57d6013321175565/.github/workflows/cpp.yml#L301)
 files.

##########
File path: .github/workflows/matlab.yml
##########
@@ -0,0 +1,61 @@
+# Licensed to the Apache Software Foundation (ASF) under one
+# or more contributor license agreements.  See the NOTICE file
+# distributed with this work for additional information
+# regarding copyright ownership.  The ASF licenses this file
+# to you under the Apache License, Version 2.0 (the
+# "License"); you may not use this file except in compliance
+# with the License.  You may obtain a copy of the License at
+#
+#   http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing,
+# software distributed under the License is distributed on an
+# "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+# KIND, either express or implied.  See the License for the
+# specific language governing permissions and limitations
+# under the License.
+
+name: MATLAB
+
+on:
+  push:
+    paths:
+      - '.github/workflows/matlab.yml'
+      - 'ci/scripts/matlab*.sh'
+      - 'matlab/**'
+      - 'cpp/src/arrow/**'
+
+concurrency:
+  group: ${{ github.repository }}-${{ github.ref }}-${{ github.workflow }}
+  cancel-in-progress: true
+
+jobs:
+
+  matlab:
+    name: MATLAB 
+    runs-on: ubuntu-latest
+    steps:
+      - name: Check out repository
+        uses: actions/checkout@v2
+        with:
+          fetch-depth: 0
+      - name: Fetch Submodules and Tags
+        shell: bash
+        run: ci/scripts/util_checkout.sh
+      - name: Install MATLAB
+        uses: matlab-actions/setup-matlab@v0
+      - name: Build MATLAB Interface

Review comment:
       In the spirit of incremental delivery, we think it makes sense to work 
on enabling `ccache` support in a follow-up pull request. We captured this work 
in: https://issues.apache.org/jira/browse/ARROW-13658. Thank you!




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


Reply via email to