Re: Manu v1.1 - Matrix Numeric package released!

mratsim Tue, 29 Oct 2019 02:01:06 -0700

What part of my message did you miss?

> The use of *. instead of .* is an excellent idea.
> 
> In Arraymancer I choose .* to be the same as Julia but it indeed breaks 
> precedence rules. I can do deprecate it to use *. as well.
> 
> Ideally @andreaferretti can adopt this dot convention as well in neo, instead 
> of |*| for the Hadamard product.

I said that this is an excellent idea and even propose that I consider changing
Arraymancer to also use that convention and also neo so we have an unified Nim
ecosystem.

> Results in 9s instead of 5. If only some weren't obsessed with performance
> they would see the issue.

Yeah I am obsessed with performance, I don't like when my hardware is not used
to its full extent and people use powerful hardware as an excuse to write
sloppy code (I'm looking at you Electron tray apps like
"[nimble](https://github.com/Maybulb/Nimble)" which uses 200MB of memory to sit
in my tray).

Furthermore people in machine learning and high-performance community write ML
training algorithms or physics simulation that runs for hours if not days, 250x
slowness for compute algorithm means that instead of training for 3 hours I
would take a literal month. It would also mean that I wouldn't be able to
compete in 2-hours machine learning competition like "[the best data scientist
of
France](https://github.com/mratsim/meilleur-data-scientist-france-2018)"/Data
Science Olympics.

It also completely goes against why everyone is wrapping C, C++, Fortran with
Python or R, why Cray is writing Chapel, why Julia raised $8M for their
project, why Google, Intel, Nvidia, Qualcomm, Huawei are expending hundreds of
millions to **write a custom hardware just to do matrix multiplication the
fastest**. It's also why people use float16 instead of float32, because it's 2x
faster on matrix multiplication. It is also why [Intel acquired Nervana System
for $350+M in
2016](https://venturebeat.com/2016/08/09/intel-acquires-deep-learning-startup-nervana/).
And the main draw of Intel hardware (Intel MKL BLAS and AVX512) compared to
AMD.

I know the needs of my domain: AI and data science and speed matters a lot, I
expect this is the same for physics and
[biostatistics](https://github.com/mratsim/Arraymancer/issues/356#issuecomment-500004552).

Quote from @brentp: > any chance of a randomzed pca? > when using
solver=randomized and that finishes in ~5 seconds for something that takes
arraymancer 250 seconds (shape is [2504, 16000])

This is not a "[I compile my Gentoo with -funroll-loops
-fomg-optimize](https://wiki.gentoo.org/wiki/GCC_optimization#But_I_get_better_performance_with_-funroll-loops_-fomg-optimize.21)",
it is rooted in where people and companies spend their time and money.

I keep a close eye on data science workflows, hype, hardware projects, software
stacks and I see a lot of deep learning compiler developer job offers from both
Intel and Nvidia since March. Even if you watch the Github repos of Intel and
Facebook, when changing the matrix multiplication backend to improve AMD
support, the first question is: [what's the
performance?](https://github.com/pytorch/pytorch/issues/26534#issuecomment-536692577)

The other thing that matters a lot is ergonomics, hence why people write R,
Python, Matlab, Julia and not raw C, C++ or Fortran. It happens that Nim can
provide both speed and ergonomics and this is why I started writing Arraymancer
in Nim in the first place and stayed in the community.

Besides, your example benchmark is not a good one, it is only using simple for
loops. Matrix multiplication is the key part and the main reason why people use
BLAS. There is a reason why we have [17000+ papers on matrix
multiplication](https://scholar.google.com/scholar?start=0&q=%22matrix+multiplication%22&hl=fr&as_sdt=0,5&as_ylo=2018)
since 2018 with more and more on how to produce custom hardware for it.

I inferred from your original post that what you wanted was a pure nim library,
I even provided you with a suggestion, by using Laser code, to reach BLAS
performance without depending on BLAS.

Lastly, on **Sunday** October 20, I had a plane from Tokyo to Hong-Kong and
then Hong-Kong to Paris. It's unfortunate but I missed most of the messages
from that day. I did suggest using parentheses though.

Re: Manu v1.1 - Matrix Numeric package released!

Reply via email to