Hello everyone, great to see and talk to some of you today. Here are the notes from today's meeting.
Anyone is welcome to respond to individual points that are listed below, to either continue (or start) a discussion. Notes: As time has gone by, mlpack has become a lot bigger and there are a lot more people contributing nowadays, so it's getting very hard to see everything that is changing; so what Ryan did was to go through the changes that happened between January 2019 until now, to get everyone on the same state: https://www.ratml.org/misc/mlpack-meeting-slides.pdf Additions to the slides: Overview ------------------------------------------------ "mlpack is nearly 12 years old now (the first code was in 2007)." That's 10 years of work on mlpack for Ryan. "2750 stars on Github, 97k downloads (according to my server logs so that’s an undercount), 142 contributors and counting..." Undercount, because we can't count whats downloaded via GitHub or PyPi, Conda or anything else. Would be nice to have some more data in that direction, so if anybody likes to look into that part, please feel free. Github says 142 contributors, but there are probably more because there are contributions from non-Github users that aren't counted. New website ------------------------------------------------ The big motivation behind the new website was to automate the deployment process. It used to be when a new version was released Ryan had to manually edit a bunch of HTML files in order to update the version (a very tedious process). What we have now is a Jekyll based website, which is nightly autogenerated to update the Doxygen documentation. Adding a new version is done by putting the new release in place and the website is automatically rebuilt through a Jenkins (build server) job. It's not perfect so if you see any problem, please don't hesitate to open a bug report so that we can get it fixed. Kernel Density Estimation ------------------------------------------------ Huge thanks to robertohueso, kernel density estimation using dual-tree algorithms is now part of the codebase; can be used from the command line or from python via the KDE binding, you can also use it from C++ directly, it's in the methods/kde directory. Neural Network / Reinforcement Learning ------------------------------------------------ A lot of things are in the process, and the data shown on the slide is just what has been merged so far. Excited to see everything else merged as well. Python binding fixes ------------------------------------------------ We now check the type of parameters passed to Python bindings, e.g. if you pass in a bool, where you supposed to pass in a matrix it will now through an exception, instead of trying to run and return with a segmentation fault. Missing things that haven't been mentioned on the slides: Previously there we had some benchmarks on the website, that aren't included anymore, we will fix that soon. We also have to check if those are up to date. There is a license mlpack/armadillo issue, armadillo switched from Mozilla Public License 2.0 to Apache License 2.0, and mlpack is still referencing the old license. We will have to look that up and update the license. The next release of mlpack will be 3.1, it has taken far to long and part of the problem was the tedious deployment process. There are a couple of open PR's that will probably go into mlpack 3.1 that are close to finish. Interesting note, almost all PR's are adding new features, so we don't have to wait/prioritize PR's that will fix bugs. * All PR's mentioned in the slides that are already merged will go into the new release, that includes all the work from last years GSoC. * If anybody can think of something that shouldn't go into the release please let us know. * Some of the smaller PR's will be part of a patch release after mlpack 3.1. Discussion topics: mlpack is a big project, that can take a lot of different directions, but it would be useful to have a handful of directions and people that are interested to take the lead here. Some interesting directions are: * Low-power and embedded devices: - mlpack is written in C++, so the deployment is somewhat easy since you can compile it to a specific device. There is a GSoC project that goes into that direction. The opportunity here is to get something out that can be easily deployed and optimized for specific devices; for example, Tensorflow which is used around the world is a huge toolkit and is not necessarily optimized. Even the light version that they provide, is for iOS and Android so it's not quite for embedded devices. - gmanlan is working on a tiny (removed dependencies, still depends on BLAS and LAPACK) version of mlpack, for some methods we don't need BLAS/LAPACK so this could be reduced even more. - A couple of ideas to reduce the size and the number of dependencies: strip out all of the unused functions, link statically, maybe there are some lightweight BLAS/LAPACK replacements or we could inline the BLAS/LAPACK directly into the code. - There are some toolkits around there, but mostly they are built for neural networks, mlpack, in contrast, does provide more than neural networks * Automatic selection of methods: - mlpack has many different implementations that give you the same result, KMeans is one example, but it's not clear which one is best, so it could be an interesting project both form the code as well as from the research side, for someone to try to make some heuristic to automatically choose the best algorithm for a given set - another idea that goes into the same direction is to use meta machine learning * Better Windows accessibility: - we have a bunch of Windows users, but the current Windows support is limited, the main reason is the lack of expertise in that area - mlpack is currently missing an easy deployment and build system something like the pip install version for windows - Windows package manager, chocolatey or nuget if you are using Visual Studio. - Windows is also lacking pip install for the python bindings. - New users like to test mlpack as quickly as possible so starting with binary ackages might be a good start. * NumFocus: - NumFocus is an organization that we have talked to at different events. - They will handle donations, help us to organize workshops and or hackathons. - Similar to the Apache software foundation. Let us know if you have any opinion on this one, we are not going to rush anything here. * Automated release process: - There are a couple of dev-ops related tasks open, so if anyone is interested in that please feel free to send us a mail or join the IRC channel. * Sommer of Docs/ Google Code-In - If someone likes to join those opportunities please feel free to talk with us. * Arbitrary precision data support: - low precision machine learning is very popular now, people like to train neural network on 8-Bit floating point numbers, etc. - mlpack mostly requires armadillo double precision matrices everywhere, there are some issues open about templatizing the whole API. - Armadillo doesn't support low precision floating point (32,64). - There could be an opportunity to make an armadillo compatible layer/library for low precision floating point. * Making cutting edge neural network available: - There are some open PR's for Neural Turing machines and Highway Networks and a couple of other models that build on top of the existing API. - The models repository is a good place to show what can be done with mlpack. * Improve the visibility of implemented methods: - It's somewhat difficult to figure out what mlpack implements and supports, in some cases you have to search through the code. Happy to clarify anything. Thanks, Marcus > On 4. Apr 2019, at 15:32, Ryan Curtin <[email protected]> wrote: > > On Fri, Mar 29, 2019 at 11:51:24PM -0400, Ryan Curtin wrote: >> Hey there everyone, >> >> After some discussion, schedule wrangling, and playing with >> videoconferencing software, we've decided that we'll have the first >> mlpack video meeting on >> >> Thursday, April 4 at 1600-1700 UTC >> >> (so to convert that to some common time zones, from west to east: 9am >> PST, 12pm EST, 4pm GMT, 6pm CEST, 7pm MST, 9:30pm IST), and we'll use >> the open-source Jitsi videoconferencing software to meet at >> >> https://meet.mlpack.org/mlpack-meeting > > Hey everyone, > > It turns out when I made the URL above that I did not realize that the > Jitsi software disallows hyphens in the meeting name. So, instead, > if you're able to attend the meeting, let's meet at this URL instead: > > https://meet.mlpack.org/mlpackmeeting > > Like I mentioned before, I'll post notes afterwards and we can follow up > on the mailing list or IRC or wherever if needed, so if you can't make > it no worries. > > Sorry for the confusion and see you in 2.5 hours! Nothing ever goes > perfect the first time. We will figure out what other difficulties we > will have shortly enough. :) > > Thanks, > > Ryan > > -- > Ryan Curtin | "Moo." > [email protected] | - Eugene Belford > _______________________________________________ > mlpack mailing list > [email protected] > http://knife.lugatgt.org/cgi-bin/mailman/listinfo/mlpack _______________________________________________ mlpack mailing list [email protected] http://knife.lugatgt.org/cgi-bin/mailman/listinfo/mlpack
