Before we return to the fray, let me point out that my signature is
misleading. My boss gave me the title of 'software engineer', so I
use it. In reality my only qualifications are electrical, and
currently I am in software operations. This, clearly, does not
qualify me as an expert in this field, so take what I say as my
opinion, often me thinking out loud. I also am ready to be convinced,
but I am not ready to accept a position blindly. Perhaps this makes
this the wrong forum for this discussion; if so, someone'll probably
let us know soon enough with a (metaphorical) well aimed brick.
[EMAIL PROTECTED] writes:
> On Oct 4, Tom Cook quoth:
[snip]
> I found this to be so
> full of assumptions I felt it necessary to point out just how many of
> these assumptions would have to be true in order for this to be the case.
Fair.
> > It just seems to me (and I am no expert in the field) that it makes
> > sense to have your database tightly linked to the key generator for
> > that database.
>
> Your assertion here (and correct me if this isn't what the above sentence
> says), databases & their key generators should be tightly linked.
Yes.
> I'm
> interpreting this to mean "makes sense to have your [EJBs] tightly linked
> to the key generator for [the] database [they are stored in]".
Not quite. While I am ready to support databases and key generators
being linked, I am not ready to break encapsulation in quite this
way. EJBs should not be dependant on the _implementation_ of a key
generator, only it's interface. I am merely advocating the use of a
database-generated key as a good implementation. This does, of
course, mean that no entity beans can be created if the database goes
down, but the database going down will cause other problems which will
manifest themselves before this one. And if your database is in a
clustered configuration and fails over, the sequence (or whatever)
used for key generation goes with it.
[snip agreement on databases being behind EJBs]
> > Of course
> > having one database generate keys for another is not sensible since,
> > as you point out, if your key generator goes down then your other
> > databases are stuffed.
>
> Agreed, however your Container is the transaction engine and therefore
> ultimately responsible for the behavior of both. Recall that your
> proposal was for the database to be the key generator for the container
> which the beans would call upon for new keys.
Beans which are stored in that database, yes. It could even be
considered a good move to have separate generators for each table,
thereby maximizing your usage of the key space, but I haven't really
thought hard about that idea, so don't take it as something I hold to.
> > You may well argue that we want to be able to move this to any
> > database backend;
>
> Corrollary: flexibility to deal with unforseen future requirements is a
> good thing. Agreed 100% and I would argue that except in the most trivial
> of applications this is essential.
Yes, but the OO paradigm's way of coping with this is through
abstraction. So the interface to our key generator is independant of
the implementation.
[snip]
I've just snipped the entire bit about enterprise applications not
moving much. This is not because I can't answer it and am ignoring
it, but because it was a rash line of argument which will not hold up
and I retract it.
[snip request for a supporting argument]
Position one - database generated keys.
---------------------------------------
The high word of the key is obtained from a database and the low word
is maintained in an internal sequence. Note that here a word is not
some fixed or system dependant size (ie. not necessarily 16 or 32
bits).
pros:
(1) Keys are guaranteed to be unique (until key-space rollover).
The database makes getting a new value from a sequence and
incrementing the sequence an atomic operation, guaranteeing
that
you can not get a duplicate high word. The key generator can
then make getting a new low word and incrementing the internal
sequence atomic (using a lock object) and we have guaranteed
unique keys.
(2) Makes data consistency/uniqueness the responsibility of the data
store.
Although this may be considered an unnecessary system
dependancy, it may also be considered a good piece of
encapsulation. Now if we can insert into the database then we
can create keys, and if we can't insert into the database then
we can't create keys. There seems little point in making your
key generator independant of the database, since the _only_ use
for the key generator is in generating keys for the database.
(3) High performance.
For n-bit words, you need to access the database every 2^n
calls
to the key generation function. Other than this, all you have
to do is an addition and comparison on each call - maybe 20
instructions if your compiler is hopeless.
(4) Simple and clean implementation.
It's a fifty-liner if you're sloppy with it.
(5) Makes maximum use of the key space.
This may sound like I like being pedantic about efficiency, but
this is a real concern. Take your system where you append a
micro-second resolution timestamp to your key. Say your system
scores, on average, one million inserts in a day. There are
eighty six thousand million (86,000,000,000) micro-seconds in a
day. You have just wasted 85,999,000,000 keys in your
keyspace. At this rate, you will waste 85,999/86000 of your
keys, or 99.9988% of your keys. This means that, for every row
in your database, there are sixteen and a half wasted bits just
in the timestamp. Over one day that's about 16Mb of wasted
space
in your database. Over one year, that's a gigabyte. It adds
up,
and this is a site under fairly heavy usage (in one year it
will
collect 365 million records). On a lightly loaded site the
wastage will be much worse.
cons:
(1) Database failure will make keys unavailable.
The significance of this is very debatable, since absence of
the
database tends to make primary key generation an un-necessary
operation, unless you have some sort of caching mechanism which
hopes that the database will come back up before it runs out of
memory.
(2) Sequences are implemented differently on different databases,
making your key generator non-portable to another database.
Re-implementing a key generator is 50 lines of code if you're
being sloppy about it. The new guy, the one you don't know
what
to do with yet, he should be able to do this in a few minutes
if
you're documentation's worth the bandwidth used to download it.
(3) Sequences are not implemented on every database.
This is an admitted deficiency. However, while not wanting to
start a flame war I'm sure we've all read many times before,
most
databases worth their salt have one. (Note that my opinions on
which databases are worth their salt are rather restrictive,
but
why not use the best?)
Position two - system information keys.
---------------------------------------
The key is constructed from the concatenation of:
- the IP address of the host
- process id
- a timestamp
- a serial number
- some random number (optional)
pros:
(1) Independant of other systems.
This method does not require any other system to be available
which can not be absolutely assumed; if there's an O/S there,
it'll run.
(2) It may be considered architecturally more elegant to keep the
key generator conceptually separated from the data store.
cons:
(1) Not guaranteed unique.
A non-unique key may be generated in the case of clock
reset/overflow or serial number overflow within the resolution
of the timestamp. This may sound unlikely, but the point is
that it is not a guaranteed unique key that is generated. The
chances of duplicate keys being generated is greatly increased
if you have multiple key generators running in a single process
(but different threads). The tacking on of a random number to
decrease the likelyhood of duplicate keys looks like a tacky
way of patching up an algorithm that someone looked at and
decided they weren't quite sure about.
(2) Performance hit.
Each key generation requires the acquisition of a timestamp and
the generation of a random number. Random number
generation, in
particular, is not always a light-weight activity.
(3) Poor utilization of key space.
See pro #4 for position one.
(4) Java implementation difficulties.
Since people seem to do this I guess it's possible, but I have
yet to come across a platform independant way of getting the
current process' ID without implementing a JNI method which
calls getpid() from the standard C library. Indeed, the notion
of a process id is about as portable as the notion of a
database
sequence.
I think this is a pretty fair sort of comparison; feel free to come up
with your own. The 'system information keys' method is a bit light-on
in the pros department, but this might just be because I haven't seen
the light yet.
In response to your slightly personal attack regarding my advocating
this position in a public forum, please note that I am not the only
one to have suggested it, indeed I was not the originator of the idea.
I merely pointed out that we use it and have been defending it since.
Aaron Mulder made the suggestion originally, in post 03531, Rickard
Oberg suggested it as an EJB in post 03566, and there is a similar,
though simpler, method outlined on www.theserverside.com in their
patterns page (note that, to make this one guaranteed unique it
suffers a performance hit).
> --------------------------------------------------------------------------
> Some people mistake the positions I take with the beliefs that I hold.
> --------------------------------------------------------------------------
Don't you hate that?
Regards
Tom
--
Tom Cook - Software Engineer
"We rarely find that people have good sense unless they agree
with us."
- Francois, Duc de la Rochefoucauld
LISAsoft Pty Ltd - www.lisa.com.au
--------------------------------------------------
38 Greenhill Rd. Level 3, 228 Pitt Street
Wayville, SA, 5034 Sydney, NSW, 2000
Phone: +61 8 8272 1555 Phone: +61 2 9283 0877
Fax: +61 8 8271 1199 Fax: +61 2 9283 0866
--------------------------------------------------
--
--------------------------------------------------------------
To subscribe: [EMAIL PROTECTED]
To unsubscribe: [EMAIL PROTECTED]
Problems?: [EMAIL PROTECTED]