1) Could the flexibility in weighting new data you're thinking about be fit into an optional argument to update!()? 3) Agreed. Right now a few methods do the opposite of that with something like update!(obj, y::Float64) = update!(obj, [y])
I've at least started an issue for a redesign to get a discussion going: https://github.com/joshday/OnlineStats.jl/issues/2. I'd be interested in hearing your thoughts, as I'm a statistician pretending to be a programmer. On Sunday, April 26, 2015 at 1:38:02 PM UTC-4, Tom Breloff wrote: > > 1) More flexibility is certainly good, as long as it doesn't impact > performance or readability. I would also like additional flexibility on > weighting new data. > 2) aside from possible overflow, updating averages instead of sums is > probably more performant when you call state() more often than update!() > 3) i also would probably like both batch and singleton updates, but with > well designed code you may be able to just wrap your singleton update > method in a loop. > > My use case is in algorithmic trading, so potentially calling an update!() > method very frequently. As such, I'd always have an eye towards performance > and any implementation would reflect that. > > I'd love to have a discussion about any redesign/merge. It sounds to me > like OnlineStats is the more natural destination for any merge... But maybe > John should weigh in? > > > On Apr 26, 2015, at 8:51 AM, Josh Day <[email protected] <javascript:>> > wrote: > > I emailed John Myles White a few months back about merging. One of his > concerns was that OnlineStats looks more ambitious, but he wanted to work > together. I was focused on the implementation progress to show off for my > oral prelim (I'm a PhD student in statistics), so nothing ever came of it. > Maybe now is the time to pull the trigger on merging. > > I have a few minor concerns: > 1) I think there needs to be more flexibility in the abstract type > structure of StreamStats. I'm currently using the same types for > OnlineStats, but I've been putting in some thought on how to improve it. > > 2) The "sufficient statistics" in OnlineStats types are based on averages > to avoid overflow (StreamStats based on sums) > > 3) I would like OnlineStats to allow both batch and singleton updates > (StreamStats uses singletons) > > > I'm definitely open for collaboration. What are the goals you're aiming > for? > > > On Friday, April 24, 2015 at 5:13:15 PM UTC-4, Tom Breloff wrote: >> >> I'm considering writing packages for the following online (i.e. updating >> models on the fly as new data arrives) techniques, but this functionality >> might exist already, or there might be a package that I should contribute >> to instead of writing my own: >> >> - Online PCA (such as "Candid covariance-free incremental principal >> component analysis") >> - Online flexible least squares (time-varying regression weights) >> - Online support vector machines/regressions >> >> Are there any packages that might have this functionality, or even a good >> framework that I could/should add to? Does anyone else have a need for >> these algorithms? >> >
