Github user fommil commented on the pull request: https://github.com/apache/incubator-spark/pull/575#issuecomment-35879891 Hi all, The discussions with ASF on the LEGAL ticket has exposed some concerns - **unrelated to the LGPL** - that I think everybody needs to be aware of regarding native BLAS/LAPACK libraries. Basically, the ASF need to bundle and license their projects in a way that is easy for distributors and end users to understand. They have gone to a lot of effort to authorise "Category A" and "Category B" licenses so that there are no surprises. However, native loading af system-provided BLAS/LAPACK **is** a surprise in this context. I don't want ASF's commercial distributors to get into a flap about dynamically loading binaries that were created by Apple, Intel, NVIDIA or AMD's. These binaries would not be explicitly listed in the software license list that Apache carefully construct. I propose a simple solution, which really is just the ASF's recommendation: we make the native components "optional" and make it very easy for distributors to turn them on if they understand the additional legal and technical implications. In fact, `netlib-java` already supports this... conservative upstream projects need only depend on the `core` artefact, and then end-users who want the native performance improvements can depend on `all` (or a more specific artefact, including their own): natives are an optional runtime dependency. @dlwh would you be happy enough to change breeze's dependency to depend on `com.github.fommil.netlib:core` and give easy instructions to your upstream users to depend on `com.github.fommil.netlib:all` in order to get the native speedups? (I can even write a this to be included in your `README`). To be honest, it would actually help clean my inbox because I get a lot of bug reports from users of Breeze who are confused about logging messages regarding natives failing to load because they have not followed the system natives instructions (e.g. they haven't installed ATLAS, so they get a warning message and then it harmlessly falls back to the Fortran or F2J implementations). Note that the Fortran reference natives - or ATLAS binaries - are not necessarily a problem from a licensing perspective, because we can explicitly list them. But, from a technical point of view I don't think it's really worth the extra efforts to give them special attention. The performance results (above) agree with all industry benchmarks (including my own and the Java Matrix Benchmarks) that say system optimised natives greatly outperform generically tuned implementations. Also, the Fortran implementation is only marginally faster than the F2J implementation (JVM JIT for the win!). BTW, it might be interesting for you to run the performance tests when using the F2J backend of `netlib-java` to convince yourself of the benefit of the system natives. Does this sound sensible?
--- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. To do so, please top-post your response. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---