Re: Using MVStore for large associative arrays?

Brian Bray Tue, 02 Apr 2013 14:04:22 -0700

Thanks Thomas!

I didn't consider compression, so that does alleviate my concern about 
large key sizes with Tuple/Array/Custom keys.  I'll start by going down 
that route.


In terms of iteration, lets say I want to get a list of all the user ids. 
 I may only have a value assigned to user[12345].name.first so I would like 
to avoid storing nodes for user[12345] and user[12345].name (ie sparse 
arrays) .  So how would I efficiently find all the user id's w/o looping 
through the entire sub-trees?  Is that were I might need an alternate map 
implementation or use secondary map(s) to describe/index the structure?

Brian


On Tuesday, April 2, 2013 12:26:58 PM UTC-6, Thomas Mueller wrote:
>
> Hi,
>
> I suggest to try out both the MVStore and MapDB, with approach 1), and 
> then see if that meets your goals. If not, my suggestion would be 6), and 
> then 4).
>
> 1) Tuple key: yes, H2 does support this. You could for example use an 
> object array as the key. Or your own class, but in that case you may want 
> to implement serialization yourself, see also 
> http://h2database.com/html/mvstore.html#dataTypes
>
> 2) Hash the keys. I'm not sure if this is what you propose? You could 
> store the 'real' key as part of the value (using a special data type) and 
> only the hashed key as the key of the map. There are two problems with this 
> approach: how to ensure keys are unique. If you use the SHA-1 hash, then 
> you are save. But the biggest problem is that keys would be randomly 
> distributed, which would be very very bad for performance if you access 
> similar keys, or try to access the keys in sorted order. So I wouldn't do 
> that.
>
> 3) Hybrid. It suffers from similar problems than 2), I wouldn't do that. 
> You can try of course.
>
> 4) Custom map / tree implementation. For MVStore, currently all map 
> implementations (MVMap, MVMapConcurrent, and MVRTreeMap) are simple 
> key-value maps, meaning each key is fully stored. In many cases this is not 
> optimal, specially when storing path-like structures like you do. Disk 
> space usage isn't actually the problem when compression is enabled (in that 
> case the repeated part of the key is compressed), but it would be possible 
> to save some time comparing the keys, meaning it's a slight performance 
> problem. Not that big, but it might be measurable. The MVStore allows you 
> to implement new map implementations quite easily. A trie / radix tree / 
> patricia trie might be more efficient for your use case. I don't plan to 
> write such a map implementation right now, but if you want please go ahead. 
> But doing that will require quite some time, and might not actually help 
> that much. It's really hard to say.
>
> 5) R-tree: I don't see how this could help in your case.
>
> 6) Normalize the data yourself, and use multiple maps. You would have a 
> map "first", with all distinct first names ("Foo",...) mapped to ids ("Foo" 
> = 1, "Fii" = 2,...). Then you have a map "last" ("Bar" = 1, "Bear" = 2). 
> Then you have a map users with a composite key ("Foo"/"Bar" = {1, 1}). 
> That's basically what you do in a relational database. The disadvantage is 
> that accessing the keys in sorted order is not possible / not easy.
>
> Regards,
> Thomas
>
>
>
>
>
> On Tue, Apr 2, 2013 at 7:27 PM, Brian Bray <[email protected]<javascript:>
> > wrote:
>
>> Hi,
>>
>> I've been watching the development of the MVStore engine as a potential 
>> solution for an idea I'm working on where I need to store large associative 
>> arrays where the data looks something like this for an example "users" 
>> structure.
>>
>> user[12345].name.first = "Foo"
>> user[12345].name.last = "Bar"
>> user[12345].address[1].city = "Seattle"
>> etc....
>>
>> This data can get very large 10-100's millions of nodes (and maybe 6-8 
>> levels deep).  One of my requirements is that I can iterate through all the 
>> nodes if necessary (usually just through very specific subtree's, IE 
>> iterate through all the addresses for a user). So basically I need 
>> a hierarchical storage engine and was wondering if MVStore (or MapDB) would 
>> fit my needs? 
>>
>> Here were a few design ideas on how I could approach this with MVStore:
>>
>> 1) Tuple key with simple value (similar to this 
>> example<https://github.com/jankotek/MapDB/blob/master/src/test/java/examples/TreeMap_Composite_Key.java>from
>>  MapDB).  Is MVStore designed for this? How could I iterate through all 
>> the nodes?  It seems like the key storage might not be very efficient as 
>> the full hierarchy of keys is stored for each small value?
>>
>> 2) Using something like java.util.Arrays.DeepHashCode(new String[] 
>> {"user","12345","name","first"}) to compute a single key value whole key 
>> path.  But then I have the iterable requirement and I'm not sure how I 
>> could derive all the keys.
>>
>> 3) Maybe a hybrid approach where I use 2 related MVMaps, one I use to 
>> store the raw data using a hash technique like #2, and a second MVMap to 
>> store the structure/hierarchy of keys, but I'm not quite sure what that 
>> would look like?
>>
>> 4) It occurred to me that maybe I need to just use raw B-Tree storage? 
>>  Since this data looks very much like a tree and what I'm really asking for 
>> is a hierarchical database.  Is that possible with MVStore?
>>
>> 5) I just noticed the part about R-Tree's and spatial queries, I suppose 
>> I've not done a lot of spacial queries, but might that be a solution to my 
>> problem as well?
>>
>> Anyway, I'm really excited about MVStore, it seems like a great little 
>> storage engine, especially since I need the MVCC-ish stuff and high 
>> concurrency!
>>
>> Thanks,
>> Brian
>>
>> -- 
>> You received this message because you are subscribed to the Google Groups 
>> "H2 Database" group.
>> To unsubscribe from this group and stop receiving emails from it, send an 
>> email to [email protected] <javascript:>.
>> To post to this group, send email to [email protected]<javascript:>
>> .
>> Visit this group at http://groups.google.com/group/h2-database?hl=en.
>> For more options, visit https://groups.google.com/groups/opt_out.
>>  
>>  
>>
>
>

-- 
You received this message because you are subscribed to the Google Groups "H2 
Database" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to [email protected].
To post to this group, send email to [email protected].
Visit this group at http://groups.google.com/group/h2-database?hl=en.
For more options, visit https://groups.google.com/groups/opt_out.

Re: Using MVStore for large associative arrays?

Reply via email to