Thanks Thomas! I didn't consider compression, so that does alleviate my concern about large key sizes with Tuple/Array/Custom keys. I'll start by going down that route.
In terms of iteration, lets say I want to get a list of all the user ids. I may only have a value assigned to user[12345].name.first so I would like to avoid storing nodes for user[12345] and user[12345].name (ie sparse arrays) . So how would I efficiently find all the user id's w/o looping through the entire sub-trees? Is that were I might need an alternate map implementation or use secondary map(s) to describe/index the structure? Brian On Tuesday, April 2, 2013 12:26:58 PM UTC-6, Thomas Mueller wrote: > > Hi, > > I suggest to try out both the MVStore and MapDB, with approach 1), and > then see if that meets your goals. If not, my suggestion would be 6), and > then 4). > > 1) Tuple key: yes, H2 does support this. You could for example use an > object array as the key. Or your own class, but in that case you may want > to implement serialization yourself, see also > http://h2database.com/html/mvstore.html#dataTypes > > 2) Hash the keys. I'm not sure if this is what you propose? You could > store the 'real' key as part of the value (using a special data type) and > only the hashed key as the key of the map. There are two problems with this > approach: how to ensure keys are unique. If you use the SHA-1 hash, then > you are save. But the biggest problem is that keys would be randomly > distributed, which would be very very bad for performance if you access > similar keys, or try to access the keys in sorted order. So I wouldn't do > that. > > 3) Hybrid. It suffers from similar problems than 2), I wouldn't do that. > You can try of course. > > 4) Custom map / tree implementation. For MVStore, currently all map > implementations (MVMap, MVMapConcurrent, and MVRTreeMap) are simple > key-value maps, meaning each key is fully stored. In many cases this is not > optimal, specially when storing path-like structures like you do. Disk > space usage isn't actually the problem when compression is enabled (in that > case the repeated part of the key is compressed), but it would be possible > to save some time comparing the keys, meaning it's a slight performance > problem. Not that big, but it might be measurable. The MVStore allows you > to implement new map implementations quite easily. A trie / radix tree / > patricia trie might be more efficient for your use case. I don't plan to > write such a map implementation right now, but if you want please go ahead. > But doing that will require quite some time, and might not actually help > that much. It's really hard to say. > > 5) R-tree: I don't see how this could help in your case. > > 6) Normalize the data yourself, and use multiple maps. You would have a > map "first", with all distinct first names ("Foo",...) mapped to ids ("Foo" > = 1, "Fii" = 2,...). Then you have a map "last" ("Bar" = 1, "Bear" = 2). > Then you have a map users with a composite key ("Foo"/"Bar" = {1, 1}). > That's basically what you do in a relational database. The disadvantage is > that accessing the keys in sorted order is not possible / not easy. > > Regards, > Thomas > > > > > > On Tue, Apr 2, 2013 at 7:27 PM, Brian Bray <[email protected]<javascript:> > > wrote: > >> Hi, >> >> I've been watching the development of the MVStore engine as a potential >> solution for an idea I'm working on where I need to store large associative >> arrays where the data looks something like this for an example "users" >> structure. >> >> user[12345].name.first = "Foo" >> user[12345].name.last = "Bar" >> user[12345].address[1].city = "Seattle" >> etc.... >> >> This data can get very large 10-100's millions of nodes (and maybe 6-8 >> levels deep). One of my requirements is that I can iterate through all the >> nodes if necessary (usually just through very specific subtree's, IE >> iterate through all the addresses for a user). So basically I need >> a hierarchical storage engine and was wondering if MVStore (or MapDB) would >> fit my needs? >> >> Here were a few design ideas on how I could approach this with MVStore: >> >> 1) Tuple key with simple value (similar to this >> example<https://github.com/jankotek/MapDB/blob/master/src/test/java/examples/TreeMap_Composite_Key.java>from >> MapDB). Is MVStore designed for this? How could I iterate through all >> the nodes? It seems like the key storage might not be very efficient as >> the full hierarchy of keys is stored for each small value? >> >> 2) Using something like java.util.Arrays.DeepHashCode(new String[] >> {"user","12345","name","first"}) to compute a single key value whole key >> path. But then I have the iterable requirement and I'm not sure how I >> could derive all the keys. >> >> 3) Maybe a hybrid approach where I use 2 related MVMaps, one I use to >> store the raw data using a hash technique like #2, and a second MVMap to >> store the structure/hierarchy of keys, but I'm not quite sure what that >> would look like? >> >> 4) It occurred to me that maybe I need to just use raw B-Tree storage? >> Since this data looks very much like a tree and what I'm really asking for >> is a hierarchical database. Is that possible with MVStore? >> >> 5) I just noticed the part about R-Tree's and spatial queries, I suppose >> I've not done a lot of spacial queries, but might that be a solution to my >> problem as well? >> >> Anyway, I'm really excited about MVStore, it seems like a great little >> storage engine, especially since I need the MVCC-ish stuff and high >> concurrency! >> >> Thanks, >> Brian >> >> -- >> You received this message because you are subscribed to the Google Groups >> "H2 Database" group. >> To unsubscribe from this group and stop receiving emails from it, send an >> email to [email protected] <javascript:>. >> To post to this group, send email to [email protected]<javascript:> >> . >> Visit this group at http://groups.google.com/group/h2-database?hl=en. >> For more options, visit https://groups.google.com/groups/opt_out. >> >> >> > > -- You received this message because you are subscribed to the Google Groups "H2 Database" group. To unsubscribe from this group and stop receiving emails from it, send an email to [email protected]. To post to this group, send email to [email protected]. Visit this group at http://groups.google.com/group/h2-database?hl=en. For more options, visit https://groups.google.com/groups/opt_out.
