What part of my message did you miss?

> The use of *. instead of .* is an excellent idea.
> 
> In Arraymancer I choose .* to be the same as Julia but it indeed breaks 
> precedence rules. I can do deprecate it to use *. as well.
> 
> Ideally @andreaferretti can adopt this dot convention as well in neo, instead 
> of |*| for the Hadamard product.

I said that this is an excellent idea and even propose that I consider changing 
Arraymancer to also use that convention and also neo so we have an unified Nim 
ecosystem.

> Results in 9s instead of 5. If only some weren't obsessed with performance 
> they would see the issue.

Yeah I am obsessed with performance, I don't like when my hardware is not used 
to its full extent and people use powerful hardware as an excuse to write 
sloppy code (I'm looking at you Electron tray apps like 
"[nimble](https://github.com/Maybulb/Nimble)" which uses 200MB of memory to sit 
in my tray).

Furthermore people in machine learning and high-performance community write ML 
training algorithms or physics simulation that runs for hours if not days, 250x 
slowness for compute algorithm means that instead of training for 3 hours I 
would take a literal month. It would also mean that I wouldn't be able to 
compete in 2-hours machine learning competition like "[the best data scientist 
of 
France](https://github.com/mratsim/meilleur-data-scientist-france-2018)"/Data 
Science Olympics.

It also completely goes against why everyone is wrapping C, C++, Fortran with 
Python or R, why Cray is writing Chapel, why Julia raised $8M for their 
project, why Google, Intel, Nvidia, Qualcomm, Huawei are expending hundreds of 
millions to **write a custom hardware just to do matrix multiplication the 
fastest**. It's also why people use float16 instead of float32, because it's 2x 
faster on matrix multiplication. It is also why [Intel acquired Nervana System 
for $350+M in 
2016](https://venturebeat.com/2016/08/09/intel-acquires-deep-learning-startup-nervana/).
 And the main draw of Intel hardware (Intel MKL BLAS and AVX512) compared to 
AMD.

I know the needs of my domain: AI and data science and speed matters a lot, I 
expect this is the same for physics and 
[biostatistics](https://github.com/mratsim/Arraymancer/issues/356#issuecomment-500004552).

Quote from @brentp: > any chance of a randomzed pca? > when using 
solver=randomized and that finishes in ~5 seconds for something that takes 
arraymancer 250 seconds (shape is [2504, 16000])

This is not a "[I compile my Gentoo with -funroll-loops 
-fomg-optimize](https://wiki.gentoo.org/wiki/GCC_optimization#But_I_get_better_performance_with_-funroll-loops_-fomg-optimize.21)",
 it is rooted in where people and companies spend their time and money.

I keep a close eye on data science workflows, hype, hardware projects, software 
stacks and I see a lot of deep learning compiler developer job offers from both 
Intel and Nvidia since March. Even if you watch the Github repos of Intel and 
Facebook, when changing the matrix multiplication backend to improve AMD 
support, the first question is: [what's the 
performance?](https://github.com/pytorch/pytorch/issues/26534#issuecomment-536692577)

The other thing that matters a lot is ergonomics, hence why people write R, 
Python, Matlab, Julia and not raw C, C++ or Fortran. It happens that Nim can 
provide both speed and ergonomics and this is why I started writing Arraymancer 
in Nim in the first place and stayed in the community.

Besides, your example benchmark is not a good one, it is only using simple for 
loops. Matrix multiplication is the key part and the main reason why people use 
BLAS. There is a reason why we have [17000+ papers on matrix 
multiplication](https://scholar.google.com/scholar?start=0&q=%22matrix+multiplication%22&hl=fr&as_sdt=0,5&as_ylo=2018)
 since 2018 with more and more on how to produce custom hardware for it.

I inferred from your original post that what you wanted was a pure nim library, 
I even provided you with a suggestion, by using Laser code, to reach BLAS 
performance without depending on BLAS.

Lastly, on **Sunday** October 20, I had a plane from Tokyo to Hong-Kong and 
then Hong-Kong to Paris. It's unfortunate but I missed most of the messages 
from that day. I did suggest using parentheses though.

Reply via email to