akarbown commented on pull request #20474:
URL: https://github.com/apache/incubator-mxnet/pull/20474#issuecomment-938160202


   @szha, @leezu, @barry-jin - can I ask you for the review of that PR?
   
   I've added new github action pipeline to test and verify this change with 
oneMKL on MacOS. It revealed some things that were needed to fix for that OS 
i.e:
   - remove using `` -Wl, --start-group/ -Wl, --end-group`` while linking 
static MKL libraries in FindBLAS.cmake (issue mentioned here: 
https://gitlab.kitware.com/cmake/cmake/-/issues/20548);
   - set proper threading layer at runtime. According to the 
[documentation](https://software.intel.com/content/dam/develop/external/us/en/documents/onemkl-developerguide-mac.pdf)
 it need to be set mkl_set_threading_layer(MKL_THREADING_INTEL);
   - added github action (using existing 
[os_x_staticbuild.yml](https://github.com/apache/incubator-mxnet/blob/master/.github/workflows/os_x_staticbuild.yml))
 that is building OneMKL with static MKL libraries (to be consistent with 
already existing scripts) and run the same tests that are for 
os_x_staticbuild.yml plus MKL tests;
   - fixed hangs that appeared while running those tests were the result of the 
numpy linking/using OpenBLAS instead of MKL BLAS and as a consequence it was 
linking libgomp which resulted in the hang (two OpenMP runtimes in one 
process). Recompiling it (done in the numpy_mkl.sh file) resolved the issue;
   - excluded ``test_bf16_operator`` tests for that action pipeline as CI MacOS 
seems to not support avx512;
   - tested locally MxNET linked with static, dynamic and SDL (Single Dynamic 
Library) on MacOS and all the tests (from the os_x_staticbuild.yml + MKL tests) 
seems to pass without any hang.
   
   Now it seems the change seems to be tested and checked for MacOS and with 
MKL BLAS. 
   Do you think that leaving that new github action for MKL on MacOS make 
sense? If so, can it look as it is or change it somehow?
   
   **Remark**: I see that windows-gpu fails, but it's rather not connected with 
that change but maybe with the VS 2019 version 16.11 Release? As I see that for 
v16.8.1 (MSVC 19.28.29333.0) it 
[passed](https://jenkins.mxnet-ci.amazon-ml.com/blue/rest/organizations/jenkins/pipelines/mxnet-validation/pipelines/windows-gpu/branches/PR-20474/runs/12/nodes/40/steps/84/log/?start=0)
 without any issues, while for v16.11.4 (MSVC 19.29.30136.0) it 
[fails](https://jenkins.mxnet-ci.amazon-ml.com/blue/rest/organizations/jenkins/pipelines/mxnet-validation/pipelines/windows-gpu/branches/PR-20474/runs/13/nodes/40/steps/84/log/?start=0).
 But I'm not 100% sure. 
   
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


Reply via email to