[FastBit-users] sparse data

Olson, Andrew Fri, 05 Oct 2012 08:22:12 -0700

Hi John,

I'd like to be able to manage many large arrays of numbers in FastBit where 
most of the positions in any given array are null.  For example, I could store 
a collection of data as arrays of 3 billion floats, but if you need 12 GB for 
each data set, space becomes an issue pretty quickly (over 1000 such data sets 
already exist).  In practice, you partition the arrays into reasonable sized 
chunks to reduce memory requirements, but that doesn't reduce the size of the 
data on disk.  However, if you skip the null values when writing the arrays to 
disk and create bitvector masks to mark the positions of the non-nulls, you can 
avoid wasting a lot of space storing nulls.  I'm not sure if there are already 
functions in ibis::bitvector that would allow you to map between positions in 
the mask and offsets in the array.  I'd like to be able to define a partition 
where number_of_rows = <100,000,000> and each column has a field like 
sparse=true that means you need to use the null mask (size=
 100,000,000) to map between the array of values and the rows in the partition. 
 Any bitmap indexes created on these sparse columns would be the same as usual 
- comprised of bitvectors that are 100,000,000 bits long, but they would need 
to use these mapping functions when being constructed.


Would you consider adding this functionality to FastBit?  I apologize if this 
is already implemented and I overlooked it.

Andrew
_______________________________________________
FastBit-users mailing list
[email protected]
https://hpcrdm.lbl.gov/cgi-bin/mailman/listinfo/fastbit-users

[FastBit-users] sparse data

Reply via email to