1) Could the flexibility in weighting new data you're thinking about be fit 
into an optional argument to update!()?
3) Agreed.  Right now a few methods do the opposite of that with something 
like update!(obj, y::Float64) = update!(obj, [y])

I've at least started an issue for a redesign to get a discussion going: 
https://github.com/joshday/OnlineStats.jl/issues/2.  I'd be interested in 
hearing your thoughts, as I'm a statistician pretending to be a programmer.



On Sunday, April 26, 2015 at 1:38:02 PM UTC-4, Tom Breloff wrote:
>
> 1) More flexibility is certainly good, as long as it doesn't impact 
> performance or readability. I would also like additional flexibility on 
> weighting new data. 
> 2) aside from possible overflow, updating averages instead of sums is 
> probably more performant when you call state() more often than update!()
> 3) i also would probably like both batch and singleton updates, but with 
> well designed code you may be able to just wrap your singleton update 
> method in a loop. 
>
> My use case is in algorithmic trading, so potentially calling an update!() 
> method very frequently. As such, I'd always have an eye towards performance 
> and any implementation would reflect that. 
>
> I'd love to have a discussion about any redesign/merge. It sounds to me 
> like OnlineStats is the more natural destination for any merge... But maybe 
> John should weigh in?
>
>
> On Apr 26, 2015, at 8:51 AM, Josh Day <[email protected] <javascript:>> 
> wrote:
>
> I emailed John Myles White a few months back about merging.  One of his 
> concerns was that OnlineStats looks more ambitious, but he wanted to work 
> together.  I was focused on the implementation progress to show off for my 
> oral prelim (I'm a PhD student in statistics), so nothing ever came of it. 
>  Maybe now is the time to pull the trigger on merging.
>
> I have a few minor concerns:
> 1) I think there needs to be more flexibility in the abstract type 
> structure of StreamStats.  I'm currently using the same types for 
> OnlineStats, but I've been putting in some thought on how to improve it.  
>
> 2) The "sufficient statistics" in OnlineStats types are based on averages 
> to avoid overflow (StreamStats based on sums)
>
> 3) I would like OnlineStats to allow both batch and singleton updates 
> (StreamStats uses singletons)
>
>
> I'm definitely open for collaboration.  What are the goals you're aiming 
> for?
>
>
> On Friday, April 24, 2015 at 5:13:15 PM UTC-4, Tom Breloff wrote:
>>
>> I'm considering writing packages for the following online (i.e. updating 
>> models on the fly as new data arrives) techniques, but this functionality 
>> might exist already, or there might be a package that I should contribute 
>> to instead of writing my own:
>>
>>    - Online PCA (such as "Candid covariance-free incremental principal 
>>    component analysis")
>>    - Online flexible least squares (time-varying regression weights)
>>    - Online support vector machines/regressions
>>
>> Are there any packages that might have this functionality, or even a good 
>> framework that I could/should add to?  Does anyone else have a need for 
>> these algorithms?
>>
>

Reply via email to