In case anyone is interested, I actually just decided to use a Matrix, and when I needed it I would call sortrows and take each column and put them in a values vector and weights vector (for further calculations) respectively. I thought this was easier, and I noticed sortrows keeps the two columns "together" (in the sense that, upon sorting, the values in cols A & B are kept together).
You can see the result here: https://github.com/pazzo83/QuantLib.jl/blob/master/src/math/statistics.jl If anyone has a possibly more efficient way of doing this, please share! - Chris On Monday, March 7, 2016 at 7:39:45 PM UTC-5, Christopher Alexander wrote: > > Many thanks, these are very helpful! Yes, the two vectors are going to be > of the same type. The StructsOfArrays package is interesting too, as let's > say you have a construction like this: > > *arr = StructOfArrays(Pair{Float64, Float64}, 100)* > > You can go ahead, and populate this as you need. If you need all the > firsts or seconds of the pairs, you can access the object's array param > (arr.arrays), and all the firsts are in the first array, and all the > seconds are in the second array. I noticed that at a certain size, the > sort algo must change, and I needed to override the resize! method for the > StructOfArrays type. I will compare the speed vs some of these other > options. > > Thanks!! > > On Monday, March 7, 2016 at 6:13:22 PM UTC-5, tshort wrote: >> >> There are several options to "keep things together", particularly with >> vectors of the same type: >> >> - DataFrame columns -- watch how you use columns to keep type stability >> >> - Nx2 Array >> >> - Nx2 NamedArray: >> https://github.com/davidavdav/NamedArrays.jl >> >> - AxisArrays: >> https://github.com/mbauman/AxisArrays.jl >> >> >> On Mon, Mar 7, 2016 at 5:11 PM, Christopher Alexander <[email protected]> >> wrote: >> >>> Yea, I was thinking about two different vectors, but then if I did any >>> sorting, the value vector and weight vector would be out-of-sync. I'll >>> check out this StructsOfArrays package >>> >>> Thanks! >>> >>> Chris >>> >>> On Monday, March 7, 2016 at 5:03:14 PM UTC-5, tshort wrote: >>>> >>>> It depends on what "various weighted statistical calculations" >>>> involves. I'd start with two vectors, `x` and `w`. If you really need them >>>> to be coupled tightly, you could define an immutable type to hold the >>>> value >>>> and the weight, but the two separate vectors can be faster for some >>>> operations. Also, see: >>>> >>>> https://github.com/simonster/StructsOfArrays.jl >>>> >>>> On Mon, Mar 7, 2016 at 4:50 PM, Christopher Alexander < >>>> [email protected]> wrote: >>>> >>>>> Hello all, I need to create a structure where I keep track of pairs of >>>>> value => weight so that I can do various weighted statistical >>>>> calculations. >>>>> >>>>> I know that StatsBase has a weights vector, which I plan on using, but >>>>> the way that is set up is that it is disassociated from each of the >>>>> values >>>>> to which the weights are to be applied. >>>>> >>>>> I need the mapping that "Pair" provides, but I've noticed that there >>>>> is no easy way, if I have an array of pairs, to grab all the first values >>>>> or all the second values (like you can do with a dict in grabbing keys or >>>>> values). >>>>> >>>>> I've tried to do something like map(first, my_array_of_pairs), but >>>>> this is about 10x slower than if you have a dictionary of value => weight >>>>> and just asked for the keys. I actually tried to use a dict at first, >>>>> but >>>>> ran into issues with duplicate values (they were overwriting each other >>>>> because the value was the key). >>>>> >>>>> Any suggestions, or any better way to manipulate an array of Pairs? >>>>> >>>>> Thanks! >>>>> >>>>> Chris >>>>> >>>> >>>> >>
