Hi John, I'd like to be able to manage many large arrays of numbers in FastBit where most of the positions in any given array are null. For example, I could store a collection of data as arrays of 3 billion floats, but if you need 12 GB for each data set, space becomes an issue pretty quickly (over 1000 such data sets already exist). In practice, you partition the arrays into reasonable sized chunks to reduce memory requirements, but that doesn't reduce the size of the data on disk. However, if you skip the null values when writing the arrays to disk and create bitvector masks to mark the positions of the non-nulls, you can avoid wasting a lot of space storing nulls. I'm not sure if there are already functions in ibis::bitvector that would allow you to map between positions in the mask and offsets in the array. I'd like to be able to define a partition where number_of_rows = <100,000,000> and each column has a field like sparse=true that means you need to use the null mask (size= 100,000,000) to map between the array of values and the rows in the partition. Any bitmap indexes created on these sparse columns would be the same as usual - comprised of bitvectors that are 100,000,000 bits long, but they would need to use these mapping functions when being constructed.
Would you consider adding this functionality to FastBit? I apologize if this is already implemented and I overlooked it. Andrew _______________________________________________ FastBit-users mailing list [email protected] https://hpcrdm.lbl.gov/cgi-bin/mailman/listinfo/fastbit-users
