Hi, Andrew, What you are asking is a very reasonable question. We should consider adding this feature. However, we are quite short on man-power to implement major changes like this. It might take a long while before we can find time to do anything about this one.
John On 10/5/12 8:21 AM, Olson, Andrew wrote: > Hi John, > > I'd like to be able to manage many large arrays of numbers in > FastBit where most of the positions in any given array are null. > For example, I could store a collection of data as arrays of 3 > billion floats, but if you need 12 GB for each data set, space > becomes an issue pretty quickly (over 1000 such data sets already > exist). In practice, you partition the arrays into reasonable > sized chunks to reduce memory requirements, but that doesn't reduce > the size of the data on disk. However, if you skip the null values > when writing the arrays to disk and create bitvector masks to mark > the positions of the non-nulls, you can avoid wasting a lot of > space storing nulls. I'm not sure if there are already functions > in ibis::bitvector that would allow you to map between positions in > the mask and offsets in the array. I'd like to be able to define a > partition where number_of_rows = <100,000,000> and each column has > a field like sparse=true that means you need to use the null mask > (size= 100,000,000) to map between the array of values and the rows > in the partition. Any bitmap indexes created on these sparse > columns would be the same as usual - comprised of bitvectors that > are 100,000,000 bits long, but they would need to use these mapping > functions when being constructed. > > Would you consider adding this functionality to FastBit? I > apologize if this is already implemented and I overlooked it. > > Andrew _______________________________________________ FastBit-users mailing list [email protected] https://hpcrdm.lbl.gov/cgi-bin/mailman/listinfo/fastbit-users
