Re: Using MVStore for large associative arrays?

Thomas Mueller Sun, 21 Apr 2013 11:16:02 -0700

Hi,

> In terms of iteration, lets say I want to get a list of all the user ids.
 I may only have a value assigned to user[12345].name.first so I would like
to avoid storing nodes for user[12345] anduser[12345].name (ie sparse
arrays)


I think the best way to do that is to store a user object that contains all
relevant info for the user, using the key 12345. When iterating, you
iterate over all users.

If you really, really want to iterate over the sub-keys as you described
(which I think is rather strange), then you could create a special iterator
that internally iterates over all users, plus for each user iterate over
all sub-keys of the user.

Regards,
Thomas






On Tue, Apr 2, 2013 at 11:04 PM, Brian Bray <[email protected]> wrote:

> Thanks Thomas!
>
> I didn't consider compression, so that does alleviate my concern about
> large key sizes with Tuple/Array/Custom keys.  I'll start by going down
> that route.
>
> In terms of iteration, lets say I want to get a list of all the user ids.
>  I may only have a value assigned to user[12345].name.first so I would
> like to avoid storing nodes for user[12345] and user[12345].name (ie
> sparse arrays) .  So how would I efficiently find all the user id's w/o
> looping through the entire sub-trees?  Is that were I might need an
> alternate map implementation or use secondary map(s) to describe/index the
> structure?
>
> Brian
>
>
> On Tuesday, April 2, 2013 12:26:58 PM UTC-6, Thomas Mueller wrote:
>
>> Hi,
>>
>> I suggest to try out both the MVStore and MapDB, with approach 1), and
>> then see if that meets your goals. If not, my suggestion would be 6), and
>> then 4).
>>
>> 1) Tuple key: yes, H2 does support this. You could for example use an
>> object array as the key. Or your own class, but in that case you may want
>> to implement serialization yourself, see also http://h2database.com/**
>> html/mvstore.html#dataTypes<http://h2database.com/html/mvstore.html#dataTypes>
>>
>> 2) Hash the keys. I'm not sure if this is what you propose? You could
>> store the 'real' key as part of the value (using a special data type) and
>> only the hashed key as the key of the map. There are two problems with this
>> approach: how to ensure keys are unique. If you use the SHA-1 hash, then
>> you are save. But the biggest problem is that keys would be randomly
>> distributed, which would be very very bad for performance if you access
>> similar keys, or try to access the keys in sorted order. So I wouldn't do
>> that.
>>
>> 3) Hybrid. It suffers from similar problems than 2), I wouldn't do that.
>> You can try of course.
>>
>> 4) Custom map / tree implementation. For MVStore, currently all map
>> implementations (MVMap, MVMapConcurrent, and MVRTreeMap) are simple
>> key-value maps, meaning each key is fully stored. In many cases this is not
>> optimal, specially when storing path-like structures like you do. Disk
>> space usage isn't actually the problem when compression is enabled (in that
>> case the repeated part of the key is compressed), but it would be possible
>> to save some time comparing the keys, meaning it's a slight performance
>> problem. Not that big, but it might be measurable. The MVStore allows you
>> to implement new map implementations quite easily. A trie / radix tree /
>> patricia trie might be more efficient for your use case. I don't plan to
>> write such a map implementation right now, but if you want please go ahead.
>> But doing that will require quite some time, and might not actually help
>> that much. It's really hard to say.
>>
>> 5) R-tree: I don't see how this could help in your case.
>>
>> 6) Normalize the data yourself, and use multiple maps. You would have a
>> map "first", with all distinct first names ("Foo",...) mapped to ids ("Foo"
>> = 1, "Fii" = 2,...). Then you have a map "last" ("Bar" = 1, "Bear" = 2).
>> Then you have a map users with a composite key ("Foo"/"Bar" = {1, 1}).
>> That's basically what you do in a relational database. The disadvantage is
>> that accessing the keys in sorted order is not possible / not easy.
>>
>> Regards,
>> Thomas
>>
>>
>>
>>
>>
>> On Tue, Apr 2, 2013 at 7:27 PM, Brian Bray <[email protected]> wrote:
>>
>>> Hi,
>>>
>>> I've been watching the development of the MVStore engine as a potential
>>> solution for an idea I'm working on where I need to store large associative
>>> arrays where the data looks something like this for an example "users"
>>> structure.
>>>
>>> user[12345].name.first = "Foo"
>>> user[12345].name.last = "Bar"
>>> user[12345].address[1].city = "Seattle"
>>> etc....
>>>
>>> This data can get very large 10-100's millions of nodes (and maybe 6-8
>>> levels deep).  One of my requirements is that I can iterate through all the
>>> nodes if necessary (usually just through very specific subtree's, IE
>>> iterate through all the addresses for a user). So basically I need
>>> a hierarchical storage engine and was wondering if MVStore (or MapDB) would
>>> fit my needs?
>>>
>>> Here were a few design ideas on how I could approach this with MVStore:
>>>
>>> 1) Tuple key with simple value (similar to this 
>>> example<https://github.com/jankotek/MapDB/blob/master/src/test/java/examples/TreeMap_Composite_Key.java>from
>>>  MapDB).  Is MVStore designed for this? How could I iterate through all
>>> the nodes?  It seems like the key storage might not be very efficient as
>>> the full hierarchy of keys is stored for each small value?
>>>
>>> 2) Using something like java.util.Arrays.DeepHashCode(**new String[]
>>> {"user","12345","name","first"**}) to compute a single key value whole
>>> key path.  But then I have the iterable requirement and I'm not sure how I
>>> could derive all the keys.
>>>
>>> 3) Maybe a hybrid approach where I use 2 related MVMaps, one I use to
>>> store the raw data using a hash technique like #2, and a second MVMap to
>>> store the structure/hierarchy of keys, but I'm not quite sure what that
>>> would look like?
>>>
>>> 4) It occurred to me that maybe I need to just use raw B-Tree storage?
>>>  Since this data looks very much like a tree and what I'm really asking for
>>> is a hierarchical database.  Is that possible with MVStore?
>>>
>>> 5) I just noticed the part about R-Tree's and spatial queries, I suppose
>>> I've not done a lot of spacial queries, but might that be a solution to my
>>> problem as well?
>>>
>>> Anyway, I'm really excited about MVStore, it seems like a great little
>>> storage engine, especially since I need the MVCC-ish stuff and high
>>> concurrency!
>>>
>>> Thanks,
>>> Brian
>>>
>>> --
>>> You received this message because you are subscribed to the Google
>>> Groups "H2 Database" group.
>>> To unsubscribe from this group and stop receiving emails from it, send
>>> an email to h2-database...@**googlegroups.com.
>>> To post to this group, send email to [email protected].
>>>
>>> Visit this group at 
>>> http://groups.google.com/**group/h2-database?hl=en<http://groups.google.com/group/h2-database?hl=en>
>>> .
>>> For more options, visit 
>>> https://groups.google.com/**groups/opt_out<https://groups.google.com/groups/opt_out>
>>> .
>>>
>>>
>>>
>>
>>  --
> You received this message because you are subscribed to the Google Groups
> "H2 Database" group.
> To unsubscribe from this group and stop receiving emails from it, send an
> email to [email protected].
> To post to this group, send email to [email protected].
> Visit this group at http://groups.google.com/group/h2-database?hl=en.
> For more options, visit https://groups.google.com/groups/opt_out.
>
>
>

-- 
You received this message because you are subscribed to the Google Groups "H2 
Database" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to [email protected].
To post to this group, send email to [email protected].
Visit this group at http://groups.google.com/group/h2-database?hl=en.
For more options, visit https://groups.google.com/groups/opt_out.

Re: Using MVStore for large associative arrays?

Reply via email to