[openejb-dev] On Improving CMP Primary Key Generators

Aaron Mulder Thu, 16 Jun 2005 19:18:51 -0700

        So we have a feature where you can create a primary key generator 
-- an object that (usually) masks some underlying DB-based mechanism for 
producing unique IDs for new objects.  For example, MySQL has 
AUTO_INCREMENT columns, Oracle and PostgreSQL have Sequence objects, 
or the user could just create a new table that holds the current ID 
count for other tables, etc.


        It seems like there are two ways to use this:

1) A single EJB has a single associated key generator.  So the generated  
IDs are presumably in some kind of order, used by that EJB only, and 
other beans would operate off a different sequence, table, etc.  This is 
required for AUTO_INCREMENT columns, which of course cannot be shared 
across multiple tables.

2) There's one key generator that produces some kind of generally unique 
IDs, and several EJBs share it.  For example, if you have several similar 
tables and you want all their IDs to come from one pool.

        Currently, there are two places where this is configured.  An EJB 
has a reference to the GBean Name of a key generator GBean.  And then 
there's a declaration for the key generator GBean itself, with any 
configuration it requires (like the name of a sequence, the data source to 
use, etc.).  This means for the two scenarios above, if you have an EJB 
JAR with 10 CMP entities:

in 1) Each of the 10 entities declares a GBean reference to a separate key
generator GBean.  There are then 10 GBeans declared, one for each key
generator.  Each GBean points to a specific table or sequence, the shared
data source, etc.  Presumably, the GBeans are declared in the
openejb-jar.xml, since they are not meaningful to EJBs in other JARs
(unless they use the same underlying tables!).

in 2) Each of the 10 entities declares a GBean reference to the same key
generator GBean.  There's a single GBean declared for the key generator.  
It points to the sequence or table or whatever, and the data source.  The
GBean may be declared at the EJB JAR or application level, depending on
where the EJBs are that will need to use it.

        To me, the first scenario is clearly not ideal -- I don't like
needing custom configuration in two places for every bean, and I don't
really like requiring a user to correctly configure GBeans.  There's also
some duplicate configuration, such as the data source, which is provided
for the EJBs and also for the key generator.  Finally, if the GBean is
given a proper JSR-77 GBean Name, it's not so easy for the EJBs to
correctly identify it (they need the application and module IDs, a type
and name need to be selected for each, etc.).  I also note that the one
example I've seen (in that place, cough cough, I'm not supposed to name)
did not use a correct JSR-77 GBean Name, which is not a very attractive
workaround.

        The second scenario, while somewhat more palatable on account of 
only requiring one GBean, has many of the same issues.  There's still the 
repeated configuration, the GBean Name, etc.  Also, in my personal usage, 
I've run into lots and lots of situations like the first scenario, and not 
too many like the second, so the advantage (based on my experience) is in 
the wrong place.

        So to get around to my point, I'd like to propose an alternative:

 - We make the key generator into a "key generator factory" -- that is, 
something that knows how to produce a certain type of key generator (based 
on sequence or table or auto increment or whatever), but is not itself a 
generator and does not have EJB-specific configuration attached.

 - We provide some additional XML elements where the EJB defines a key 
generator -- to let you provide configuration data there.  It would also 
let you refer to an arbitrary GBean like now, to handle custom generators.

 - In our code, when we process an EJB with a reference to a generator, we 
construct some kind of properties object, pop in the data source for the 
EJB, and add any properties specified in the new config elements.  We pass 
the properties object to the named generator factory, and it returns a 
configured generator for the EJB to use.  So the factory would have a 
method like "public KeyGenerator createGenerator(Hashtable properties)".

 - We define our 3 known generator factories in the standard server plans, 
so they'll have known/expected GBean Names and types

 - We use a conventional reference structure to refer to the generator 
factory, meaning you can provide a name only (generator-link), the full 
GBean Name (generator-name), or the components of the generator GBean Name 
(generator with 6 child elements).

 - We code the needed generator factories accordingly, though they should 
be pretty lightweight.

 - Not strictly related, but... We change the key generator class names to 
be WAAAY shorter :)

        So I'm envisioning the configuration will look something like 
this:

SERVER PLAN

<gbean name="SQLGenerator"
      class="org.openejb...SQLGeneratorFactory" />

EJB PLAN

<enterprise-beans>
  <entity>
    ...
    <automatic-key-generation>
      <generator-link>SQLGenerator</generator-link>
      <config-param>
        <config-param-name>SQL</config-property-name>
        <config-param-value>
          SELECT SOME_SEQUENCE.NEXTVAL FROM DUAL
        </config-param-value>
      </config-param>
    </automatic-key-generation>
  </entity>
  <entity>
    ...
    <automatic-key-generation>
      <generator-link>SQLGenerator</generator-link>
      <config-param>
        <config-param-name>SQL</config-property-name>
        <config-param-value>
          SELECT OTHER_SEQUENCE.NEXTVAL FROM DUAL
        </config-param-value>
      </config-param>
      <config-param>
        <config-param-name>BlockSize</config-property-name>
        <config-param-value>10</config-param-value>
      </config-param>
    </automatic-key-generation>
  </entity>
</enterprise-beans>

        So this pretty much solves my issues from before:

1) The factory naming is standard, and can be referred to by -link 
shortcuts instead of a full GBean Name (if desired)

2) There's only one block of key generation data for each EJB

3) The user doesn't need to write GBean definitions unless they want
functionality we don't provide (and we cut down on the overall number
of GBean declarations)

4) The EJB silently passes its own data source to the key generator, 
though we could provide a facility to override that with a configured 
data source GBean Name for our factories easily enough.

I think it would also be possible to eliminate the generic "config-params" 
in favor of more specific elements for each of the 3 generator types we 
support, but then we'd need to change the XML if and when we add new 
generators.  I could go either way on that one -- I like less generic XML, 
but I also like stable schemas.


Anyway, feedback would be appreciated, but in the absence of any, I'll 
take a stab at this when I have some coding cycles.  I don't think I have 
TranQL commit though, so I'll need to run some patches through someone.

Thanks,
        Aaron

[openejb-dev] On Improving CMP Primary Key Generators

Reply via email to