What part of my message did you miss? > The use of *. instead of .* is an excellent idea. > > In Arraymancer I choose .* to be the same as Julia but it indeed breaks > precedence rules. I can do deprecate it to use *. as well. > > Ideally @andreaferretti can adopt this dot convention as well in neo, instead > of |*| for the Hadamard product.
I said that this is an excellent idea and even propose that I consider changing Arraymancer to also use that convention and also neo so we have an unified Nim ecosystem. > Results in 9s instead of 5. If only some weren't obsessed with performance > they would see the issue. Yeah I am obsessed with performance, I don't like when my hardware is not used to its full extent and people use powerful hardware as an excuse to write sloppy code (I'm looking at you Electron tray apps like "[nimble](https://github.com/Maybulb/Nimble)" which uses 200MB of memory to sit in my tray). Furthermore people in machine learning and high-performance community write ML training algorithms or physics simulation that runs for hours if not days, 250x slowness for compute algorithm means that instead of training for 3 hours I would take a literal month. It would also mean that I wouldn't be able to compete in 2-hours machine learning competition like "[the best data scientist of France](https://github.com/mratsim/meilleur-data-scientist-france-2018)"/Data Science Olympics. It also completely goes against why everyone is wrapping C, C++, Fortran with Python or R, why Cray is writing Chapel, why Julia raised $8M for their project, why Google, Intel, Nvidia, Qualcomm, Huawei are expending hundreds of millions to **write a custom hardware just to do matrix multiplication the fastest**. It's also why people use float16 instead of float32, because it's 2x faster on matrix multiplication. It is also why [Intel acquired Nervana System for $350+M in 2016](https://venturebeat.com/2016/08/09/intel-acquires-deep-learning-startup-nervana/). And the main draw of Intel hardware (Intel MKL BLAS and AVX512) compared to AMD. I know the needs of my domain: AI and data science and speed matters a lot, I expect this is the same for physics and [biostatistics](https://github.com/mratsim/Arraymancer/issues/356#issuecomment-500004552). Quote from @brentp: > any chance of a randomzed pca? > when using solver=randomized and that finishes in ~5 seconds for something that takes arraymancer 250 seconds (shape is [2504, 16000]) This is not a "[I compile my Gentoo with -funroll-loops -fomg-optimize](https://wiki.gentoo.org/wiki/GCC_optimization#But_I_get_better_performance_with_-funroll-loops_-fomg-optimize.21)", it is rooted in where people and companies spend their time and money. I keep a close eye on data science workflows, hype, hardware projects, software stacks and I see a lot of deep learning compiler developer job offers from both Intel and Nvidia since March. Even if you watch the Github repos of Intel and Facebook, when changing the matrix multiplication backend to improve AMD support, the first question is: [what's the performance?](https://github.com/pytorch/pytorch/issues/26534#issuecomment-536692577) The other thing that matters a lot is ergonomics, hence why people write R, Python, Matlab, Julia and not raw C, C++ or Fortran. It happens that Nim can provide both speed and ergonomics and this is why I started writing Arraymancer in Nim in the first place and stayed in the community. Besides, your example benchmark is not a good one, it is only using simple for loops. Matrix multiplication is the key part and the main reason why people use BLAS. There is a reason why we have [17000+ papers on matrix multiplication](https://scholar.google.com/scholar?start=0&q=%22matrix+multiplication%22&hl=fr&as_sdt=0,5&as_ylo=2018) since 2018 with more and more on how to produce custom hardware for it. I inferred from your original post that what you wanted was a pure nim library, I even provided you with a suggestion, by using Laser code, to reach BLAS performance without depending on BLAS. Lastly, on **Sunday** October 20, I had a plane from Tokyo to Hong-Kong and then Hong-Kong to Paris. It's unfortunate but I missed most of the messages from that day. I did suggest using parentheses though.
