Hi, I suggest to try out both the MVStore and MapDB, with approach 1), and then see if that meets your goals. If not, my suggestion would be 6), and then 4).
1) Tuple key: yes, H2 does support this. You could for example use an object array as the key. Or your own class, but in that case you may want to implement serialization yourself, see also http://h2database.com/html/mvstore.html#dataTypes 2) Hash the keys. I'm not sure if this is what you propose? You could store the 'real' key as part of the value (using a special data type) and only the hashed key as the key of the map. There are two problems with this approach: how to ensure keys are unique. If you use the SHA-1 hash, then you are save. But the biggest problem is that keys would be randomly distributed, which would be very very bad for performance if you access similar keys, or try to access the keys in sorted order. So I wouldn't do that. 3) Hybrid. It suffers from similar problems than 2), I wouldn't do that. You can try of course. 4) Custom map / tree implementation. For MVStore, currently all map implementations (MVMap, MVMapConcurrent, and MVRTreeMap) are simple key-value maps, meaning each key is fully stored. In many cases this is not optimal, specially when storing path-like structures like you do. Disk space usage isn't actually the problem when compression is enabled (in that case the repeated part of the key is compressed), but it would be possible to save some time comparing the keys, meaning it's a slight performance problem. Not that big, but it might be measurable. The MVStore allows you to implement new map implementations quite easily. A trie / radix tree / patricia trie might be more efficient for your use case. I don't plan to write such a map implementation right now, but if you want please go ahead. But doing that will require quite some time, and might not actually help that much. It's really hard to say. 5) R-tree: I don't see how this could help in your case. 6) Normalize the data yourself, and use multiple maps. You would have a map "first", with all distinct first names ("Foo",...) mapped to ids ("Foo" = 1, "Fii" = 2,...). Then you have a map "last" ("Bar" = 1, "Bear" = 2). Then you have a map users with a composite key ("Foo"/"Bar" = {1, 1}). That's basically what you do in a relational database. The disadvantage is that accessing the keys in sorted order is not possible / not easy. Regards, Thomas On Tue, Apr 2, 2013 at 7:27 PM, Brian Bray <[email protected]> wrote: > Hi, > > I've been watching the development of the MVStore engine as a potential > solution for an idea I'm working on where I need to store large associative > arrays where the data looks something like this for an example "users" > structure. > > user[12345].name.first = "Foo" > user[12345].name.last = "Bar" > user[12345].address[1].city = "Seattle" > etc.... > > This data can get very large 10-100's millions of nodes (and maybe 6-8 > levels deep). One of my requirements is that I can iterate through all the > nodes if necessary (usually just through very specific subtree's, IE > iterate through all the addresses for a user). So basically I need > a hierarchical storage engine and was wondering if MVStore (or MapDB) would > fit my needs? > > Here were a few design ideas on how I could approach this with MVStore: > > 1) Tuple key with simple value (similar to this > example<https://github.com/jankotek/MapDB/blob/master/src/test/java/examples/TreeMap_Composite_Key.java>from > MapDB). Is MVStore designed for this? How could I iterate through all > the nodes? It seems like the key storage might not be very efficient as > the full hierarchy of keys is stored for each small value? > > 2) Using something like java.util.Arrays.DeepHashCode(new String[] > {"user","12345","name","first"}) to compute a single key value whole key > path. But then I have the iterable requirement and I'm not sure how I > could derive all the keys. > > 3) Maybe a hybrid approach where I use 2 related MVMaps, one I use to > store the raw data using a hash technique like #2, and a second MVMap to > store the structure/hierarchy of keys, but I'm not quite sure what that > would look like? > > 4) It occurred to me that maybe I need to just use raw B-Tree storage? > Since this data looks very much like a tree and what I'm really asking for > is a hierarchical database. Is that possible with MVStore? > > 5) I just noticed the part about R-Tree's and spatial queries, I suppose > I've not done a lot of spacial queries, but might that be a solution to my > problem as well? > > Anyway, I'm really excited about MVStore, it seems like a great little > storage engine, especially since I need the MVCC-ish stuff and high > concurrency! > > Thanks, > Brian > > -- > You received this message because you are subscribed to the Google Groups > "H2 Database" group. > To unsubscribe from this group and stop receiving emails from it, send an > email to [email protected]. > To post to this group, send email to [email protected]. > Visit this group at http://groups.google.com/group/h2-database?hl=en. > For more options, visit https://groups.google.com/groups/opt_out. > > > -- You received this message because you are subscribed to the Google Groups "H2 Database" group. To unsubscribe from this group and stop receiving emails from it, send an email to [email protected]. To post to this group, send email to [email protected]. Visit this group at http://groups.google.com/group/h2-database?hl=en. For more options, visit https://groups.google.com/groups/opt_out.
