Hi, I have a dataset of many (about 30 million) observations of the type
Tuple{Person, Array{DataA,1}, Array{DataB,1}} where immutable Person # simplified id::Int32 female::Bool age::Int8 end immutable DataA startdate::Int32 enddate::Int32 type::Int8 extra:UInt8 end and DataB is similar but different. The vectors of DataA and DataB have about 2-15 elements, varies between observations. I would like to dump this data in the most compact format. The intention is to read it later, and produce various summary statistics for each observation, ie a mostly linear traversal of this file many time, but... ... it would be great if I could look up an observation with a particular id field in Person without scanning the whole file, but this is not absolutely necessary if the overhead would be large. Does not matter if it is not compatible between Julia versions, given the data size I just want to save space (data can be regenerated before dumping, takes a few hours). The question is: what do you recommend in Julia? A "homebrew database" would be 1. opening a stream with GZip.open, 2. using write to dump data, 3. optionally save file positions for each observation and save that in a different file. This is not much to implement, however if I could get this with a bit of tweaking using something more standard, I would be happier. I could be using HDF5, but I am concerned about the overhead (did not try). Thanks, Tamas