It's fine to prototype it. Because users can also get BLAS support by enabling a profile already, I think it bears understanding if perf is at least comparable before adding it as another option. Or it could simply be an extra module / library until that time if it's desirable to release. This may be a nice testing ground to see how much the API can substitute in for BLAS operations.
On Wed, Dec 16, 2020 at 4:41 AM Ludovic Henry <luhe...@microsoft.com> wrote: > Hi, > > > > Thank you for the feedback. I’ll work on the profile-based approach to > selectively compile this VectorBLAS class in. As for the run-time, I > haven’t used specifically a reflection-based approach but a more simple > `try { new VectorBLAS() } catch (NoClassDefFoundError) { new F2jBLAS() }`. > I’ll submit a PR against gitHub.com/apache/spark with this change. Should I > also fill up a bug inside the Jira as well? > > > > On a side note, I worked yesterday on extracting this code into a > standalone project [1]. It’s not so much so that Spark can depend on that > (even though it could be possible), but it is to make it easier to develop, > test, and benchmark new implementations on my end. > > > > Thank you, > > Ludovic > > > > [1] https://github.com/luhenry/blas > > > > *From: *Erik Krogen <xkro...@apache.org> > *Sent: *Tuesday, 15 December 2020 17:33 > *To: *Sean Owen <sro...@gmail.com> > *Cc: *Ludovic Henry <luhe...@microsoft.com>; dev@spark.apache.org; Bernhard > Urban-Forster <beu...@microsoft.com> > *Subject: *Re: Usage of JDK Vector API in ML/MLLib > > > > Regarding selective compilation, you can hide sources behind a Maven > profile such as `-Pvectorized`. Check out what we do to switch between the > `hive-1.2` and `hive-2.3` profiles where different source directories are > grabbed at compile-time (the hive-1.2 profile was recently removed so you > might have to go back a little in git history). This won't do it > automatically based on JDK version, but it's probably good enough. At > runtime you can more easily do a JDK version check -- I agree with Sean on > loading via reflection. > > > > Personally, I see no reason not to start adding this support in > preparation for broader adoption of JDK 16, provided that it is properly > protected behind flags. This could be a big win for installations which > haven't gone through the process of installing native BLAS libs. > > > > On Tue, Dec 15, 2020 at 7:10 AM Sean Owen <sro...@gmail.com> wrote: > > Yes it's intriguing, though as you say not readily available in the wild > yet. > > I would also expect native BLAS to outperform f2j also, so yeah that's the > interesting question, whether this is a win over native code or not. > > I suppose the upside is eventually, we may expect this API to be available > in all JVMs, not just those with native libraries added at runtime. > > > > I wonder if a short-term goal would be to ensure that these calls are > simply abstracted away, which they should already me, so it's easy to plug > in this new 'BLAS' implementation. I'm sure it's possible to load this > selectively via reflection, as that's what the current libraries do. > > And there may be additional code paths that could benefit from these > operations that don't already. > > > > On Tue, Dec 15, 2020 at 8:30 AM Ludovic Henry > <luhe...@microsoft.com.invalid> wrote: > > Hello, > > > > I’ve, over the past few days, looked into using the new Vector API [1] to > accelerate some BLAS operations straight from Java. You can find a gist at > [2] containing most of the changes in > mllib-local/src/main/scala/org/apache/spark/ml/linalg/BLAS.scala. > > > > To measure performance, I’ve added a BLASBenchmark.scala [3] at > mllib-local/src/test/scala/org/apache/spark/ml/linalg/BLASBenchmark.scala. > I do see some promising speedups, especially compared to F2jBLAS. I’ve > unfortunately not been able to install OpenBLAS locally and compare > performance to native, but I would still expect native to be faster, > especially on large inputs. See [4] for some f2j vs vector performance > comparison. > > > > The primary blocker is that the Vector API is only available in incubator > mode, starting with JDK 16. We can have an easy run-time check whether we > can use the Vectorized BLAS. But, to compile the Vectorized BLAS class, we > need JDK 16+. Spark 3.0+ does compile with JDK 16 (it works locally), but I > don’t know how to selectively compile sources based on the JDK version used > at compile-time. > > > > But much more importantly, I want to get your feedback before I keep > exploring this idea further. Technically, it is feasible, and we’ll observe > speed up whenever the native BLAS is not installed. Moreover, I am solely > focusing on ML/MLLib for now. However, there is still graphx (I haven’t > checked if there is anything vectorizable) and even supporting more > explicit use of the Vector API in catalyst, which is a much bigger project. > > > > Thank you, > > Ludovic Henry > > > > [1] https://openjdk.java.net/jeps/338 > <https://nam06.safelinks.protection.outlook.com/?url=https%3A%2F%2Fopenjdk.java.net%2Fjeps%2F338&data=04%7C01%7Cluhenry%40microsoft.com%7C0529612745ad4559cf0608d8a1172a0d%7C72f988bf86f141af91ab2d7cd011db47%7C1%7C0%7C637436468156914676%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000&sdata=QpoFg2EPrkCsbFHGUvK26opwpbVruQOwCde70o%2FE50s%3D&reserved=0> > > [2] > https://gist.github.com/luhenry/6b24ac146a110143ad31736caf7250e6#file-blas-scala > <https://nam06.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgist.github.com%2Fluhenry%2F6b24ac146a110143ad31736caf7250e6%23file-blas-scala&data=04%7C01%7Cluhenry%40microsoft.com%7C0529612745ad4559cf0608d8a1172a0d%7C72f988bf86f141af91ab2d7cd011db47%7C1%7C0%7C637436468156924670%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000&sdata=M%2Bir7vVGDxDamrXvwvrtqzhOEQ6TD7oJT3sf5fJ1Ovk%3D&reserved=0> > > [3] > https://gist.github.com/luhenry/6b24ac146a110143ad31736caf7250e6#file-blasbenchmark-scala > <https://nam06.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgist.github.com%2Fluhenry%2F6b24ac146a110143ad31736caf7250e6%23file-blasbenchmark-scala&data=04%7C01%7Cluhenry%40microsoft.com%7C0529612745ad4559cf0608d8a1172a0d%7C72f988bf86f141af91ab2d7cd011db47%7C1%7C0%7C637436468156934671%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000&sdata=2PRGL%2FeVB4QMGwpNyebTAKttjESnhek5LDSQuYRYawM%3D&reserved=0> > > [4] > https://gist.github.com/luhenry/6b24ac146a110143ad31736caf7250e6#file-f2j-vs-vector-log > <https://nam06.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgist.github.com%2Fluhenry%2F6b24ac146a110143ad31736caf7250e6%23file-f2j-vs-vector-log&data=04%7C01%7Cluhenry%40microsoft.com%7C0529612745ad4559cf0608d8a1172a0d%7C72f988bf86f141af91ab2d7cd011db47%7C1%7C0%7C637436468156934671%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000&sdata=4FA7p18jd6yVnIvRGNNeDWA5%2F%2Fw249z6%2B%2BOuJhRnTBI%3D&reserved=0> > > >