Assuming you have enough memory to write a BitArray to the JLD file 
initially, if you later open the JLD file with mmaparrays=true and read it, 
JLD will mmap the underlying Vector{Uint64} so that pieces are read from 
the disk as they are accessed. (The actual specifics of how this works is 
up to the OS, but generally it works well.) In principle you can also 
modify the BitArray the changes will be saved to the disk, although I'm not 
sure how well that works since I don't do it in my own code. There is no 
easy way to resize the BitArray if you do this, though.

Simon

On Tuesday, August 5, 2014 5:06:16 PM UTC-4, Tim Holy wrote:
>
> To me it sounds like you've come up with the main options: BitArray or 
> Array{Bool}. Since a BitArray is, underneath, a Vector{Uint64} with 
> different 
> indexing semantics, it seems you could probably come up with a reasonable 
> way 
> to update just part of it. But even if you use Array{Bool}, you're "only" 
> talking a few hundred megabytes, which is not a catastrophically large. 
> Also 
> consider keeping everything in memory; with 100GB of RAM you could store 
> an 
> awful lot of selections. 
>
> --Tim 
>
> On Tuesday, August 05, 2014 12:01:58 PM ggggg wrote: 
> > Hello, 
> > 
> > I have an application where I have a few hundred million events, and I'd 
> > like to make and work with different selections of sets of those events. 
> > The events each have various values associated with them, say for 
> > simplicity color, timestamp, and loudness. Say one selection includes 
> all 
> > the events within 5 minutes after a blue event.  Or I want to select all 
> > events that aren't above some loudness threshold. I'd like to be able to 
> > save these selections in a JLD file for later use on some or all events. 
> I 
> > also need to be able update the selections as I observe more events. 
> > 
> > My baseline plane it to have an integer associated with each event and 
> each 
> > bit in the integer i corresponds to a selection.  So bit 1 is true for 
> > events within 5 minutes and bit 2 is true for events above the loudness 
> > threshold.  Then for each event's integer I can do bits(i)[1] or 
> bits(i)[2] 
> > to figure out if it is included in each selection. That seems like it 
> would 
> > be inefficient since bits() returns a string, so I would have to call 
> > bool(bits(i)[1]).  I could use a bitwise mask of some sort like 1&i==0 
> for 
> > the first bit and 2&i==0 for the second bit. 
> > 
> > A BitArray seems like a decent choice, except that you can only 
> read/write 
> > the entire array from a JLD file, rather than just a part of it.  That 
> will 
> > be inefficient since I'll often want to look at only a small subset of 
> the 
> > total events. And every time I want to update for new events, I would 
> need 
> > to read the entire BitArray, extend it in memory, then delete the old 
> JLD 
> > object and replace it with a new JLD object.  It seems plausible I could 
> > figure out how to read/write part of a BitArray from a JLD as I've 
> already 
> > done some hacking on HDF5.jl, but that could be a large amount of work. 
> > 
> > An Array{Bool} works well with JLD, and seems just as well suited as a 
> > BitArray.  I think it's 8 times bigger than BitArray, and has a similar 
> > space ratio to an integer (depending on how many selections I actually 
> use) 
> > because bools are stored as 1 byte? I can probably live with that, 
> although 
> > again it seems sort of inefficient. 
> > 
> > Any advice on how I should go about deciding, or options I hadn't 
> > considered?  Also why does bits() return a string, instead of say 
> > Vector{Bool} or BitArray? 
>
>

Reply via email to