Hi Kevin,
Thank you for your interest and valued suggestion.
> Is this support meant for databases that do not support partitions
directly?
Slice is targeted for environments with multiple stand-alone database
instances, possibly even heterogeneous. If an application wants to bring
data from these database instances into a *single* in-memory persistence
context then Slice can be useful.
For database vendor that supports horizontal partitioning, one will be
better off with standard OpenJPA, and of course, data distribution then
becomes a decision around partition key rather than a user-defined
policy plug-in.
> The DistributionPolicy interface seems a bit limiting.
The contract is Slice calls back with list of configured slice and a
newly persistence-capable instance X, user tells which slice should
store X.
> The slice names in the configuration can not change without a
corresponding change in the DistributionPolicy callback.
Yes and No. I am thinking what to do with this issue and thank you for
your input. However, one guiding principle I will like to adhere to
"Entity classes must be agonistic of the partitioned database
environment".
Why I said No: Let us consider a concrete example. I am going to store
all Person whose name is less than 'John Doe' in the first slice and
rest in another. So my DistributionPolicy implementaion looks like
String distribute(Object pc, List<String> slices, Object ctx) {
if (((Person)pc).getName().compareTo("John Doe") > 0)
return slices.get(0);
return slices.get(1);
In my configuration how the slices are logically named is immaterial in
such a case. I can call them
<property name="slice.One.ConnectionURL" value="jdbc://URL1"/>
<property name="slice.Two.ConnectionURL" value="jdbc://URL2"/>
And later edit them to
<property name="slice.ABC.ConnectionURL" value="jdbc://URL1"/>
<property name="slice.XYZ.ConnectionURL" value="jdbc://URL2"/>
without any change in application behavior.
> Maybe the callback could return an opaque Object based on whatever
(key?) that could then be used by our runtime to determine the proper
slice? With ObjectGrid, we did this via a PartitionableKey interface
that the primary key would have to implement.
"via a PartitionableKey interface that the primary key would have to
implement." -- this is what possibly violates my guiding principle.
But may be I need to understand your suggested solution.
> When you mention possible "parallel execution", are you assuming the
use of the openjpa "multithreaded" property for the EntityManagers? Or,
would this parallel execution utilize separate EntityManagers?
Neither. A single EntityManager E uses a DistributedStoreManager DM
which in turn holds connection to many database DB1,DB2 etc. Now when
JPQL query Q is issued by E, DM runs the same SQL query against DB1, DB2
-- but each SQL query is executed on separate thread drawn from a pool.
The results of each query is collected, merged with ordering and
returned to the caller as a single result list.
> On first read, this support looks to be very cool for top-down
development. Depending on your response to the first bullet, I find it
> harder to understand how a customer might already have a poor-man's
version of partitioning and work upwards. Just thinking outloud...
We have to wait for people to use it to know whether this makes sense.
Andy Schlaikjer is our first user trying it on 100 database instances.
May be Andy should comment.
Regards and thanks again for your interest --
Pinaki
-----Original Message-----
From: Kevin Sutter [mailto:[EMAIL PROTECTED]
Sent: Thursday, January 31, 2008 4:34 PM
To: [email protected]
Subject: Re: Extension to OpenJPA for distributed databases
Pinaki,
I like the idea. I used to be involved with the ObjectGrid project here
at IBM and we used a similar technique for partitioning our in-memory
cache. I have a few questions about Slice, but for the most part, I am
in favor of including it in the OpenJPA deliverable.
- Basic question. Is this support meant for databases that do not
support partitions directly? My experience has been that if a
database
supports partitioning directly, then the interaction with the
database
doesn't change at all. That is, the application (or openjpa runtime
in this
case) does not have to change to take advantage of the partitioning.
It's
transparent. But, your documentation seems to indicate required
slice
configuration and callbacks. I'm just trying to understand how you
see this
support fitting into the partitioned database landscape.
- The DistributionPolicy interface seems a bit limiting. The
application code is now very tightly linked with the configuration.
The
slice names in the configuration can not change without a
corresponding
change in the DistributionPolicy callback. I would prefer something
more
general. Maybe the callback could return an opaque Object based on
whatever
(key?) that could then be used by our runtime to determine the proper
slice? With ObjectGrid, we did this via a PartitionableKey interface
that
the primary key would have to implement. We would then callback on
the
getPartition() method to get the Object value which we would then use
to
determine the partition. This could be a String value, if so
desired. But,
it also allowed other Object types as well.
- When you mention possible "parallel execution", are you assuming
the
use of the openjpa "multithreaded" property for the EntityManagers?
Or,
would this parallel execution utilize separate EntityManagers?
- On first read, this support looks to be very cool for top-down
development. Depending on your response to the first bullet, I find
it
harder to understand how a customer might already have a poor-man's
version
of partitioning and work upwards. Just thinking outloud...
Like I said up-front, I like the basic idea of Slice. I think we
probably need a bit more discussion on how this fits into the overall
database landscape and architecture, but eventually I would like to see
this become part of OpenJPA. Thanks and nice work.
Kevin
On Jan 30, 2008 5:44 PM, Pinaki Poddar <[EMAIL PROTECTED]> wrote:
> Hi,
> I would like to add an extension of OpenJPA that allows an
> application to transact against a set of distributed, possibly
> hetereogenous, horizontally-partitioned databases [2]. The project is
> named as Slice and is similar in scope to Hibernate Shards.
> The development codebase so far been maintained in Apache Lab
> repository and given its current state I propose to add the codebase
> to a new openajpa-slice module.
>
> I request you to review current state of its implementaion [1] and
> express your opinion/views on feasibility of my proposal.
>
> Regards --
>
> Pinaki
>
> [1] Slice website:
> http://people.apache.org/~ppoddar/slice/site/index.html<http://people.
> apache.org/%7Eppoddar/slice/site/index.html>
> [2] dev2dev blog:
> http://dev2dev.bea.com/blog/pinaki.poddar/archive/2008/01/slice_openjp
> a_
> f_1.html
>
> Notice: This email message, together with any attachments, may
> contain information of BEA Systems, Inc., its subsidiaries and
> affiliated entities, that may be confidential, proprietary,
> copyrighted and/or legally privileged, and is intended solely for the
> use of the individual or entity named in this message. If you are not
> the intended recipient, and have received this message in error,
> please immediately return this by email and then delete it.
>
Notice: This email message, together with any attachments, may contain
information of BEA Systems, Inc., its subsidiaries and affiliated
entities, that may be confidential, proprietary, copyrighted and/or legally
privileged, and is intended solely for the use of the individual or entity
named in this message. If you are not the intended recipient, and have received
this message in error, please immediately return this by email and then delete
it.