Hi, Andrew,

What you are asking is a very reasonable question.  We should consider
adding this feature.  However, we are quite short on man-power to
implement major changes like this.  It might take a long while before
we can find time to do anything about this one.

John


On 10/5/12 8:21 AM, Olson, Andrew wrote:
> Hi John,
> 
> I'd like to be able to manage many large arrays of numbers in
> FastBit where most of the positions in any given array are null.
> For example, I could store a collection of data as arrays of 3
> billion floats, but if you need 12 GB for each data set, space
> becomes an issue pretty quickly (over 1000 such data sets already
> exist).  In practice, you partition the arrays into reasonable
> sized chunks to reduce memory requirements, but that doesn't reduce
> the size of the data on disk.  However, if you skip the null values
> when writing the arrays to disk and create bitvector masks to mark
> the positions of the non-nulls, you can avoid wasting a lot of
> space storing nulls.  I'm not sure if there are already functions
> in ibis::bitvector that would allow you to map between positions in
> the mask and offsets in the array.  I'd like to be able to define a
> partition where number_of_rows = <100,000,000> and each column has
> a field like sparse=true that means you need to use the null mask
> (size= 100,000,000) to map between the array of values and the rows
> in the partition.  Any bitmap indexes created on these sparse
> columns would be the same as usual - comprised of bitvectors that
> are 100,000,000 bits long, but they would need to use these mapping
> functions when being constructed.
> 
> Would you consider adding this functionality to FastBit?  I
> apologize if this is already implemented and I overlooked it.
> 
> Andrew
_______________________________________________
FastBit-users mailing list
[email protected]
https://hpcrdm.lbl.gov/cgi-bin/mailman/listinfo/fastbit-users

Reply via email to