hmm. So maybe I was too quick to say that the collocation constraint is
too inhibiting. Coming from my expectations of what a sharding ORM
system would provide for me, it definitely is too constraining. But I
promise to put more thought, maybe in different use cases it's still ok.
So I'll continue to think on this.
But I ask for you guys to think on the use cases that can't be
implemented and usability costs that the collocation constraint places
on the system.
I know that with sharding you can never execute a join across databases,
so fancier queries will not execute as expected. But baking that
limitation of sharding into the data model system itself seems like over
doing it. Just warning people that they have to be careful not to
traverse relations that are not collocated would be fine.. we're not
children after all :) :)
But like I said, we're taking a big bet that OpenJPA slices will fit our
scale out requirements. So thank you! This is an amazing head start,
and looks solidly built and coded. So I'll keep thinking on this, the
limitations and possibilities :) And my complaints are pretty minor in
the big picture.
For example, I have a work-around to the collocation constraint, I'm
just seeing if we can make the system nicer and easier to use. My
work-around would be to store references to objects (ids), not the
objects themselves (cross db joins are impossible). Then in our
application we'll load the referenced objects are desired.. So that we
maintain the relations, not the ORM system...
Fernando Padilla wrote:
right, thank you :)
you have re-confirmed how I thought the collocation constraint worked,
and you also gave me a great motivation why the "replicated" feature
came about ( as a work around for the collocation constraint ).
So now we're back to sqaure one. Looking at my example use case, the
collocation constraint is still too inhibiting. I want to get rid of
those requirements! :)
So if you wanted to remove that requirement, how would you go about it?
What code would you look at, etc etc. If I want to put work into
fixing this up, where should I begin to look, etc etc. what are some
possible plans.. :) :) :)
Pinaki Poddar wrote:
One key aspect of data distribution model used in Slice is that the
distribution policy is based at instance level and *not* at class level.
What it implies for your given scenario is that while User U1 instance
can
be persisted in Slice A, another User instance U2 can be stored in
Slice B.
So it is not necessary that all User instances are stored in one Slice
and
all Comment instances are in a different slice and so forth.
But what about related instances? For the sake of concreteness let us
consider the following instances and relations:
User U1 belongs to Group G1 and has commented C11, C12, C13
User U2 belongs to Group G1 and has commented C21
The distribution policy determines that U1 and U2 are stored in Slice
A and
B respectively.
The collocation constraint forces that any instance reachable from U1
(i.e.
closure of U1 in Graph theory terms) is stored in Slice A and any
instance
reachable from U2 is stored in U2. Thus, C11, C12, C13 go to Slice A
while
C21 goes to Slice B.
Where does G1 go? G1 is reachable from both U1 and U2. The only current
option is G1 is annotated as @Replicated and identical copies of G1 are
stored in both Slice A and B.
Of course, collocation constraint will prohibit G1 to have a relation
to U1
and U2. So, @Replicated is mainly serves to model 'master' data i.e. data
that are referred by many but itself refers none. However, the
relationship
is not completely lost. For example, a query such as select u from
User u where u.group.name='G1'" will fetch both U1 and U2 by executing
parallel queries across Slice A and B
and merging the results.
Fernando Padilla wrote:
So, now that I have some attention, I'll post up a question I sent
out a month ago.
I want to make a connected datamodel, but I want to put objects on
different databases..
Let's say I have 3 objects:
User (slice root)
- name
Group (slice root)
- name
- users
Comment (slice grouped with group)
- group
- user
- text
As you can see they are all inter-related. But I let's say I want to
distribute Users and Groups across databases. But they are related,
but can't be collocated.
So can you help me understand the "collocation" limitation of slices,
and a way to enhance it to remove this limitation ( if I understand
it properly ).
ps - If i understand the limitation, I can't have a ManyToMany
relationship from Group to Users, or ManyToOne from Comment to User,
instead I would have to have a set of userIds. And I would have to
load up each user object myself through code.