Hi,

I suggest to try out both the MVStore and MapDB, with approach 1), and then
see if that meets your goals. If not, my suggestion would be 6), and then
4).

1) Tuple key: yes, H2 does support this. You could for example use an
object array as the key. Or your own class, but in that case you may want
to implement serialization yourself, see also
http://h2database.com/html/mvstore.html#dataTypes

2) Hash the keys. I'm not sure if this is what you propose? You could store
the 'real' key as part of the value (using a special data type) and only
the hashed key as the key of the map. There are two problems with this
approach: how to ensure keys are unique. If you use the SHA-1 hash, then
you are save. But the biggest problem is that keys would be randomly
distributed, which would be very very bad for performance if you access
similar keys, or try to access the keys in sorted order. So I wouldn't do
that.

3) Hybrid. It suffers from similar problems than 2), I wouldn't do that.
You can try of course.

4) Custom map / tree implementation. For MVStore, currently all map
implementations (MVMap, MVMapConcurrent, and MVRTreeMap) are simple
key-value maps, meaning each key is fully stored. In many cases this is not
optimal, specially when storing path-like structures like you do. Disk
space usage isn't actually the problem when compression is enabled (in that
case the repeated part of the key is compressed), but it would be possible
to save some time comparing the keys, meaning it's a slight performance
problem. Not that big, but it might be measurable. The MVStore allows you
to implement new map implementations quite easily. A trie / radix tree /
patricia trie might be more efficient for your use case. I don't plan to
write such a map implementation right now, but if you want please go ahead.
But doing that will require quite some time, and might not actually help
that much. It's really hard to say.

5) R-tree: I don't see how this could help in your case.

6) Normalize the data yourself, and use multiple maps. You would have a map
"first", with all distinct first names ("Foo",...) mapped to ids ("Foo" =
1, "Fii" = 2,...). Then you have a map "last" ("Bar" = 1, "Bear" = 2). Then
you have a map users with a composite key ("Foo"/"Bar" = {1, 1}). That's
basically what you do in a relational database. The disadvantage is that
accessing the keys in sorted order is not possible / not easy.

Regards,
Thomas





On Tue, Apr 2, 2013 at 7:27 PM, Brian Bray <[email protected]> wrote:

> Hi,
>
> I've been watching the development of the MVStore engine as a potential
> solution for an idea I'm working on where I need to store large associative
> arrays where the data looks something like this for an example "users"
> structure.
>
> user[12345].name.first = "Foo"
> user[12345].name.last = "Bar"
> user[12345].address[1].city = "Seattle"
> etc....
>
> This data can get very large 10-100's millions of nodes (and maybe 6-8
> levels deep).  One of my requirements is that I can iterate through all the
> nodes if necessary (usually just through very specific subtree's, IE
> iterate through all the addresses for a user). So basically I need
> a hierarchical storage engine and was wondering if MVStore (or MapDB) would
> fit my needs?
>
> Here were a few design ideas on how I could approach this with MVStore:
>
> 1) Tuple key with simple value (similar to this 
> example<https://github.com/jankotek/MapDB/blob/master/src/test/java/examples/TreeMap_Composite_Key.java>from
>  MapDB).  Is MVStore designed for this? How could I iterate through all
> the nodes?  It seems like the key storage might not be very efficient as
> the full hierarchy of keys is stored for each small value?
>
> 2) Using something like java.util.Arrays.DeepHashCode(new String[]
> {"user","12345","name","first"}) to compute a single key value whole key
> path.  But then I have the iterable requirement and I'm not sure how I
> could derive all the keys.
>
> 3) Maybe a hybrid approach where I use 2 related MVMaps, one I use to
> store the raw data using a hash technique like #2, and a second MVMap to
> store the structure/hierarchy of keys, but I'm not quite sure what that
> would look like?
>
> 4) It occurred to me that maybe I need to just use raw B-Tree storage?
>  Since this data looks very much like a tree and what I'm really asking for
> is a hierarchical database.  Is that possible with MVStore?
>
> 5) I just noticed the part about R-Tree's and spatial queries, I suppose
> I've not done a lot of spacial queries, but might that be a solution to my
> problem as well?
>
> Anyway, I'm really excited about MVStore, it seems like a great little
> storage engine, especially since I need the MVCC-ish stuff and high
> concurrency!
>
> Thanks,
> Brian
>
> --
> You received this message because you are subscribed to the Google Groups
> "H2 Database" group.
> To unsubscribe from this group and stop receiving emails from it, send an
> email to [email protected].
> To post to this group, send email to [email protected].
> Visit this group at http://groups.google.com/group/h2-database?hl=en.
> For more options, visit https://groups.google.com/groups/opt_out.
>
>
>

-- 
You received this message because you are subscribed to the Google Groups "H2 
Database" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to [email protected].
To post to this group, send email to [email protected].
Visit this group at http://groups.google.com/group/h2-database?hl=en.
For more options, visit https://groups.google.com/groups/opt_out.


Reply via email to