Hey Rishabh, Totally agreed---we could definitely provide better support for forecasting methods. But, I do agree with Marcus that there needs to be some reason that people would pick mlpack over other frameworks. Typically that reason might be speed, or a better algorithmic implementation, but there are other possibilities too, of course.
One of the things that's important regardless, though, is API---so a big question would be, what do we use to represent time-series data? Is it seamless across mlpack algorithms? For instance, looking at the way we represent time-series data for RNNs could be a starting place. (And even that could be changed if there was a compelling reason). We'd also need to make sure that the way we choose to represent time-series data matches with the representations used by other tools that prospective users might already be familiar with, so that the barrier to entry for them is a bit lower. I hope this is helpful! Thanks, Ryan On Fri, Mar 12, 2021 at 06:23:05PM +0530, RISHABH GARG wrote: > Hello Marcus, > I think I didn't make a point very clear in my previous email. Actually > what I found is that there are a couple of libraries like statsmodels and > sktime that are dedicated just for time series forecasting, classification, > regression etc. but I couldn't find any good open source library in C++ > that provides easy to use time series models. One C++ library I found is > Alglib but that too is not completely open source. Therefore, I think > mlpack could be one of the first big open source C++ libraries that > provides these methods. > > Also, the methods I mentioned in the previous email are elementary and you > can kind of call them as the LEGO blocks of the whole time series analysis. > One thing that I have discovered in forecasting methods is that they are > built progressively on top of each other. For example if we take ARIMA then > it is a combination of an autoregressive model and moving average with a > number of differencing steps i.e. combination of three different methods. > The point I am trying to make is that complex models are built on top of > many simpler models. > > Thus, for motivation and what should be the minimum expectations from our > API we can reference the above python libraries as they are quite mature. > But I don't think we can do benchmarking with them since C++ will surely > beat Python in execution time. > > Whatever I have mentioned above is just scratching the surface. There are > lots of research going on in the field, but I think we should first start > with the foundations. > > Please let me know if I missed something or if anything needs further > insights. Also, If you like, then I can also provide more details related > to implementation and integration with existing codebase or API related > details. > > Sorry if the mail got too big. Thanks for reading :) > > Regards > Rishabh Garg > > > > On Thu, Mar 11, 2021 at 10:00 PM Marcus Edel <[email protected]> > wrote: > > > Hello Rishabh, > > > > thanks for reaching out and welcome to the community, I like the idea, > > but we should check how mlpack can differentiate from the existing methods; > > is there a recent method that is not available in other frameworks (check > > for > > papers), can we make an existing method faster etc. As you said there are > > frameworks out there that implemented the methods already, so I think it's > > a > > good idea to check what mlpack can bring to the table. > > > > Thanks, > > Marcus > > > > On 10. Mar 2021, at 10:27, RISHABH GARG <[email protected]> wrote: > > > > Hello everyone, > > > > As most of us know that time series analysis and forecasting methods are > > quite useful in the real world. In most of the practical life datasets, we > > see some or many time dependent features. Thus, they are highly useful and > > powerful methods. Therefore, in my opinion every machine learning / data > > science library should have these methods. But unfortunately, mlpack does > > not have any time series method implemented yet :( > > > > Therefore I would like to propose this as a project idea for GSOC 2021 of > > implementing time series forecasting models. Some of the most famous and > > commonly used forecasting methods are listed below (mostly taken from issue > > #2668) - > > > > 1. Naive > > 2. Seasonal naive > > 3. Seasonal trend loess decomposition > > 4. Holt winters > > 5. Exponential smoothing > > 6. Arima > > 7. Autoregression > > > > > > Over the limited time of GSOC 2021, it might not be possible to implement > > all of these, so I can pick 2-3 methods from this list and implement them. > > Also, these methods will require some basic utilities for their > > implementations so that would also come under the hood of this project. > > > > This would be a really interesting project for me to work on. I have > > recently done a Data Science course in my university where I came across a > > couple of these and I was fascinated at how useful these methods can be in > > real life. I have already done some work implementing the Naive model in > > #2789 and I would love to continue it over the coming summer. > > > > I request all mentors to see if this could be a nice GSOC project and if > > anyone like to mentor this project. > > > > The valuable feedback of anyone from the mlpack community will be > > immensely helpful. > > > > Thanks and regards, > > Rishabh Garg > > _______________________________________________ > > mlpack mailing list > > [email protected] > > http://knife.lugatgt.org/cgi-bin/mailman/listinfo/mlpack > > > > > > > _______________________________________________ > mlpack mailing list > [email protected] > http://knife.lugatgt.org/cgi-bin/mailman/listinfo/mlpack -- Ryan Curtin | "You can think about it... but don't do it." [email protected] | - Sheriff Justice _______________________________________________ mlpack mailing list [email protected] http://knife.lugatgt.org/cgi-bin/mailman/listinfo/mlpack
