1) More flexibility is certainly good, as long as it doesn't impact performance 
or readability. I would also like additional flexibility on weighting new data. 
2) aside from possible overflow, updating averages instead of sums is probably 
more performant when you call state() more often than update!()
3) i also would probably like both batch and singleton updates, but with well 
designed code you may be able to just wrap your singleton update method in a 
loop. 

My use case is in algorithmic trading, so potentially calling an update!() 
method very frequently. As such, I'd always have an eye towards performance and 
any implementation would reflect that. 

I'd love to have a discussion about any redesign/merge. It sounds to me like 
OnlineStats is the more natural destination for any merge... But maybe John 
should weigh in?


> On Apr 26, 2015, at 8:51 AM, Josh Day <[email protected]> wrote:
> 
> I emailed John Myles White a few months back about merging.  One of his 
> concerns was that OnlineStats looks more ambitious, but he wanted to work 
> together.  I was focused on the implementation progress to show off for my 
> oral prelim (I'm a PhD student in statistics), so nothing ever came of it.  
> Maybe now is the time to pull the trigger on merging.
> 
> I have a few minor concerns:
> 1) I think there needs to be more flexibility in the abstract type structure 
> of StreamStats.  I'm currently using the same types for OnlineStats, but I've 
> been putting in some thought on how to improve it. 
> 
> 2) The "sufficient statistics" in OnlineStats types are based on averages to 
> avoid overflow (StreamStats based on sums)
> 
> 3) I would like OnlineStats to allow both batch and singleton updates 
> (StreamStats uses singletons)
> 
> 
> I'm definitely open for collaboration.  What are the goals you're aiming for?
> 
> 
>> On Friday, April 24, 2015 at 5:13:15 PM UTC-4, Tom Breloff wrote:
>> I'm considering writing packages for the following online (i.e. updating 
>> models on the fly as new data arrives) techniques, but this functionality 
>> might exist already, or there might be a package that I should contribute to 
>> instead of writing my own:
>> Online PCA (such as "Candid covariance-free incremental principal component 
>> analysis")
>> Online flexible least squares (time-varying regression weights)
>> Online support vector machines/regressions
>> Are there any packages that might have this functionality, or even a good 
>> framework that I could/should add to?  Does anyone else have a need for 
>> these algorithms?

Reply via email to