On Mon, Dec 04, 2023 at 01:19:42PM +0200, Ryan Mitchley wrote: > Hi all, > > I am aware of some options in Armadillo for Gaussian Mixture Model > clustering. Is anyone in particular aware of performant algorithms (in > MLpack or elsewhere) for performing iterative / online clustering (also > called streaming clustering) in particular. My interests are in iterative > cluster estimation, with downdating of samples (i.e. data expires). > > This particular combination of requirements has seemed to be challenging. > > I am aware of xokde++, which seems to be very promising (online-KDE): > https://arxiv.org/abs/1606.02608 > When I examined the associated code, though, it seemed to be very much a > research demonstrator artifact. It looks like it need a fair amount of > development and refinement.
Hey Ryan, I don't know if this satisfies your requirements fully, but the GMM class in mlpack does have the option to use an existing model as a starting point for training. So although it may not be the most efficient way to do things, you could imagine training a model on your original data, then removing some of the original data that has expired, adding new data, and then training again. It isn't *quite* online GMMs in the way you were thinking, but it might manage to at least be something in the right direction. Hope that helps. At least personally I don't know xokde++, but perhaps some of the ideas there could be adapted and cleaned up into production code too. Thanks, Ryan -- Ryan Curtin | "Don't fight it son. Confess quickly! If you hold out too [email protected] | long you could jeopardize your credit rating." - Guard _______________________________________________ mlpack mailing list -- [email protected] To unsubscribe send an email to [email protected] %(web_page_url)slistinfo%(cgiext)s/%(_internal_name)s
