I actually really agree with this! Does it mean we need to change the 
existing function's source code to deal with the problem as you suggest?

On Wednesday, June 8, 2016 at 7:48:42 AM UTC-5, Andreas Noack wrote:
>
> It would be great if we could come up with a solution where the 
> NA/Nullable handling wouldn't have to be hard coded in a specific 
> statistical function, say cov. It's early and I haven't had coffee yet so 
> the idea is probably flawed but, in general, it might be useful to use a 
> dedicated `Accumulator` type when doing accumulations, e.g. a sum would be 
> something like
>
> function sum(x::AbstractVector)
>     acc = Acc{eltype(x) + eltype(x)}(0)
>     for xx in x
>         acc !+ xx
>     end
> end
>
> then instead of specifying the NA handling for every statistical function. 
> It would be a matter of defining something like `(!+)(x::Acc, y::Nullable) 
> = x` to "remove" the effect of NAs in the accumulation. Of course, you 
> don't always want to remove NAs so this would have to be adjustable. What 
> kind of functionality exists in NullableArrays for handling Nullable is 
> different ways?
>
> The original reason I've started to consider the accumulator type is to 
> have a way of handling memory reuse, e.g. for BigFloats and JuMP 
> expressions but maybe it could also be useful for NA/Nullable handling.
>
>
> On Wed, Jun 8, 2016 at 4:42 AM, Milan Bouchet-Valat <[email protected] 
> <javascript:>> wrote:
>
>> Le mardi 07 juin 2016 à 17:23 -0700, Jessica Koh a écrit :
>> > Hello Andreas,
>> >
>> > Sorry I deleted the post before you commented on this. Thank you so
>> > much for your comment!
>> >
>> > Yes, I have already tried that, and that works great with 2
>> > variables. However, I am dealing with multiple variables with missing
>> > values, and the location of missing values differ across different
>> > variables. I want the covariance function to handle missing values by
>> > pairwise deletion; all available observations should be used to
>> > calculate each pairwise covariance without regard to whether
>> > variables outside that pair are missing.
>> >
>> > I can technically write up the function from scratch to do this. But
>> > this seems like a basic problem, so I was guessing there might be
>> > some library already written that handle this. Do you suggest writing
>> > the function from scratch, or are you aware of the existing functions
>> > to solve this? 
>> You're right that it's an essential function. I think we should write
>> one based on the Nullable framework instead of on the NA/DataArrays one
>> (which is on its way out). That function could either live in
>> StatsBase.jl or in NullableArrays.jl.
>>
>>
>> Regards
>>
>> > > I think you'd have to remove them first. E.g. something like 
>> > >
>> > > julia> X = DataArray(randn(10,2));
>> > >
>> > > julia> X[2,1] = X[3,2] = NA;
>> > >
>> > > julia> cov(X[!vec(any(isna(X), 2)),:])
>> > > 2×2 DataArrays.DataArray{Float64,2}:
>> > >  1.19373   0.236507
>> > >  0.236507  0.524404
>> > >
>> > >
>> > > On Tue, Jun 7, 2016 at 6:26 PM, Jessica Koh <[email protected]
>> > > > wrote:
>> > > > Hi all,
>> > > >
>> > > > Is there a way to create a covariance matrix of matrix that
>> > > > contains NA values, using "cov()" function from StatsBase?
>> > > >
>>
>> --
>> You received this message because you are subscribed to the Google Groups 
>> "julia-stats" group.
>> To unsubscribe from this group and stop receiving emails from it, send an 
>> email to [email protected] <javascript:>.
>> For more options, visit https://groups.google.com/d/optout.
>>
>
>

-- 
You received this message because you are subscribed to the Google Groups 
"julia-stats" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to [email protected].
For more options, visit https://groups.google.com/d/optout.

Reply via email to