Hello,

I have an application where I have a few hundred million events, and I'd 
like to make and work with different selections of sets of those events. 
The events each have various values associated with them, say for 
simplicity color, timestamp, and loudness. Say one selection includes all 
the events within 5 minutes after a blue event.  Or I want to select all 
events that aren't above some loudness threshold. I'd like to be able to 
save these selections in a JLD file for later use on some or all events. I 
also need to be able update the selections as I observe more events.

My baseline plane it to have an integer associated with each event and each 
bit in the integer i corresponds to a selection.  So bit 1 is true for 
events within 5 minutes and bit 2 is true for events above the loudness 
threshold.  Then for each event's integer I can do bits(i)[1] or bits(i)[2] 
to figure out if it is included in each selection. That seems like it would 
be inefficient since bits() returns a string, so I would have to call 
bool(bits(i)[1]).  I could use a bitwise mask of some sort like 1&i==0 for 
the first bit and 2&i==0 for the second bit.

A BitArray seems like a decent choice, except that you can only read/write 
the entire array from a JLD file, rather than just a part of it.  That will 
be inefficient since I'll often want to look at only a small subset of the 
total events. And every time I want to update for new events, I would need 
to read the entire BitArray, extend it in memory, then delete the old JLD 
object and replace it with a new JLD object.  It seems plausible I could 
figure out how to read/write part of a BitArray from a JLD as I've already 
done some hacking on HDF5.jl, but that could be a large amount of work.

An Array{Bool} works well with JLD, and seems just as well suited as a 
BitArray.  I think it's 8 times bigger than BitArray, and has a similar 
space ratio to an integer (depending on how many selections I actually use) 
because bools are stored as 1 byte? I can probably live with that, although 
again it seems sort of inefficient.

Any advice on how I should go about deciding, or options I hadn't 
considered?  Also why does bits() return a string, instead of say 
Vector{Bool} or BitArray?

Reply via email to