You can take some inspiration from DataArrays: https://github.com/JuliaStats/DataArrays.jl/blob/44192cf6261d6eed476018fb2770c5ca8dc4e9f2/src/operators.jl#L377-L445
There's a link there to a blog post about speeding up R's multiplication — I've not read it too carefully, but I think that may get you started on looking at some of the performance trade-offs here. In general, I'd bet it's faster to just multiply everything (even the elements masked by NULL/NA), and then fix it up later. But if you have lots of NULL/NA elements, that might not be true. On Monday, April 18, 2016 at 5:35:11 PM UTC-4, [email protected] wrote: > > > In particular, if X and Y contain NaNs in different places... > > I meant that if X has a NaN in row t, then row t is deleted in both X and > Y. Example: > X = [1;NaN] > Y = [10;11], > then we redefine as > Xb = [1] > Yb = [10] > and get Xb'Yb = [10] > > This is a fairly typical approach in eg. regression analysis. To > explicitly find&delete (many) such rows is time and memory intensive when > the matrices are large (eg. 200,000 rows instead of 2, with say 8,000 rows > that need to be deleted). I hoped NullableArrays would help here. > > Paul S > > > > On Monday, 18 April 2016 22:49:40 UTC+2, Milan Bouchet-Valat wrote: >> >> Le lundi 18 avril 2016 à 13:16 -0700, [email protected] a >> écrit : >> > Hi and thanks for the reply. >> > >> > However, I am not sure that I fully understand >> > >NullableArrays are not needed if you only have NaNs >> > >> > Maybe I have the wrong expectations about NullableArrays, but I hoped >> > that it would provide a quick "excise": cut out all rows where there >> > is a NaN in either X or Y and then do X'Y. Clearly, this excise can >> > be done explicitly but that costs time and memory. Am I wrong in this >> > expectation? >> I'm not sure what you mean. In particular, if X and Y contain NaNs in >> different places, removing rows/columns with NaNs may give matrices >> with incompatible dimensions. Could you provide an example? >> >> > Paul S >> > >> > >> > >> > > Le lundi 18 avril 2016 à 07:40 -0700, [email protected] a >> > > écrit : >> > > > Hi, >> > > > >> > > > I want to use NullableArrays to facilitate some multivariate >> > > > statistics (NaNs...). >> > > > >> > > > If X is a NullableArray{T,K} and Y is a NullableArray{T,L}, can I >> > > do >> > > > X'Y? (My clumsy attempts say no, but I might have missed >> > > something.) >> > > > >> > > > Thanks for the help /Paul S >> > > It looks like you need to defined zero(): >> > > Base.zero{T}(::Nullable{T}) = Nullable(zero(T)) >> > > >> > > Then it works, at least for simple cases. You should probably file >> > > an >> > > issue in GitHub against NullableArrays.jl so that we have a look at >> > > the >> > > best solution for this. This method shouldn't be defined in Julia >> > > by >> > > default (else many other methods will need a special treatment), >> > > but >> > > NullableArrays could do something about this. >> > > >> > > >> > > BTW, NullableArrays are not needed if you only have NaNs: floats >> > > handle >> > > them just fine. They are only useful when you have null/missing >> > > values >> > > other than NaN, or types other than floats. >> > > >> > > >> > > Regards >> >
