Hi,

I am working on adding typesafe enum support to castor JDO, and I have a question that I'd to get other castor users' opinion on before I get too far into development. I'd appreciate any comments.

If you caught my RFC when I sent out the first version, you'll remember that I proposed to add several attributes to the field element in the castor mapping file in order to support enumerations. (In the original proposal, it was four new attributes but I have since added a fifth.) A sample:

<field name="metasyn"
type="example.Metasyn"
enum="true"
enum-create-method="lookup"
enum-value-method="getName"
<!-- enum-value-direct="name" -->
enum-value-type="string"
>
<sql name="metasyn" type="varchar"/>
</field>

(The semantics of all this is covered in the RFC, which is attached.
RFC: add support for typesafe enumeration classes to Castor JDO
Version 0.10

Intro
-----
Enumerations are a common method for ensuring data integrity, both in
software and in relational databases.  As a platform for linking the two,
Castor JDO should provide support for persisting class fields whose type is 
a Java typesafe enumeration.

Typesafe Enum
-------------
The typesafe enumeration class is a standard pattern for implementing enumerations
in Java.  While not explicitly found in the language specification, the pattern
is used by Sun (e.g., java.awt.RenderingHints.Key) and is commonly recommended
as good OO practice.  The Castor project's own Source Generator makes use of it.

A typesafe enum class looks like this:

public class Metasyn {
  /** An optional field or two for storing some intrinsic identifier(s) */
  private final String name;
  private final int ord;
  /** private constructor -- no external instantiation */
  private Metasyn(String name, int ord) {
    this.name = name;
    this.ord = ord;
  }

  public final String getName() { return name; }
  public final int getOrd() { return ord; }

  /* values */
  public static final Metasyn FOO = new Metasyn("foo",0);
  public static final Metasyn BAR = new Metasyn("bar",1);
  public static final Metasyn BAZ = new Metasyn("baz",2);

  /** A method to look up the Metasyn instance for a particular name.
   *  For larger enums, it might be better to use a Map. */
  public static final Metasyn lookup(String name) {
    if      (name.equals("foo")) return FOO;
    else if (name.equals("bar")) return BAR;
    else if (name.equals("baz")) return BAZ;
    else return null;
  }

  /** A method to look up the Metasyn instance for a particular ordinal.
   *  For larger enums, it might be better to use a List or an array. */
  public static final Metasyn lookup(int ord) {
    switch (ord) {
      case 0: return FOO;
      case 1: return BAR;
      case 2: return BAZ;
      default: return null;
    }
  }
}

The name and ord fields and their associated lookup method are not always 
necessary, but they can be convenient.  For JDO, some mechanism like them is 
required. Note that name could be any type of object or primitive -- or might
be omitted.  The important thing is that it be something that JDO can persist.

Enums in RDBMSes
----------------
Enumerations are not explictly supported in standard SQL.  There are three
ways that they have traditionally been handled when creating SQL-based
schemas.

Simplest is to create a field in the table that refers to the enumeration
with the type of the enumeration's value and then directly store the
enumerated value in that table.  A refinement of this method is to apply
a constraint to that field so that it may only contain the values in that
enumeration.  There are a bunch of possible downsides to this method, depending
on the nature of the enumerated values.  That discussion is not in the scope of
this document.

The second method is a bit more complicated, but potentially much more efficient.
Instead of storing the value directly, create a table containing the enumerated
values mapped to simple keys (like ints).  Then refer to the enumerated values
via their keys in the data table.  For example:

CREATE TABLE metasyn_enum (
  enum_key INTEGER PRIMARY KEY,
  enum_value CHAR(3) UNIQUE
);

CREATE TABLE data (
  other_data VARCHAR(30),
  metasyn INTEGER REFERENCES metasyn_enum(enum_key)
);

The third method is to use a SQL extension that is RDBMS-specific.  For instance,
MySQL provides an ENUM datatype which looks the the user like the first option
but behaves internally like the second.

Enum support in Castor JDO should be able to handle the first two cases.  In the
next section, I'll describe how both might be handled using the same design.

Putting Them Together: Design
-----------------------------
Adding this new functionality should be handlable by small additions to the 
syntax of the JDO mapping file.  I propose that the mapping for a typesafe
enumeration field be the following:

<field name="metasyn" 
       type="example.Metasyn"
       enum="true"
       enum-create-method="lookup"
       enum-value-method="getName"
       <!-- enum-value-direct="name" -->
       enum-value-type="string"
       >
  <sql name="metasyn" type="varchar"/>
</field>

As you can see, enum support adds five new attributes.  Semantics:

enum: 
    If included and true, indicates that this field is a typesafe enum.  
    Allows the user to use defaults for other enum- attributes.  Implied true
    if any other enum- methods are specified.
enum-create-method:
    Specifies the method to use to get an instance of the typesafe enum class
    upon loading an objecct from the database.  It must be the name of a public, 
    static method of the the class specified in field#type.  It must take exactly 
    one parameter that is of a type that the sql#type can be coerced into by 
    JDO.  In this sample case, that would be a String.  Default is "valueOf", 
    since that's what Source Generator uses.
enum-value-method:
    Specifies the method to use to get the value to store in the DB out of a
    typesafe enum instance.  It must be the name of a public, non-static method 
    of the class specified in field#type.  It must take 0 parameters and its 
    return type must be coerceable into the sql#type specified.  In this case, 
    String would work.  Default is "toString", since that's what Source Generator 
    provides.  Should not be specified at the same time as enum-value-direct.
enum-value-direct:
    Specifies the field to use to get the value to store in the DB out of a
    typesafe enum instance, as an alternative to enum-value-method.  It must be
    the name of a public field of the class specified in field#type.  Its type
    must be coerceable into the sql#type specified.  In this case, String would
    work.  No default.  Should not be specified at the same time as 
    enum-value-method.
enum-value-type:
    This is an optional attribute, analogous to field#type.  Like field#type, 
    it is optional and used for additional type checking.  If it is specified,
    the argument to the create method and the return type of the value method
    (or the type of the value-direct field) must be this type.  enum-value-type
    may be one of the supported castor "short names" (like int, string, date,
    etc.) or a fully-qualified package name.  And, of course, it must be 
    coerceable into the type in the sql element.

I am open to different names for these attributes.

[Other optional attributes on <field> would still be supported (e.g., 
get-method).]

This design would provide for the first two of the traditional methods of 
handling an enum in SQL described above.  Hopefully it is clear how it handles
the first case (direct storage of enumeration values).  This would also deal 
with the MySQL ENUM type described in point three.

For the second case, the end developer would be required to maintain a field in
the typesafe enum which mapped to the enum_key in the enumeration table, and to
provide accessor and lookup methods that returned ints (or whatever the key 
type was).  Source Generator already provides half this capability with its
getType method -- it would have to be modified to add another valueOf method
that took an int for full compatibility with this way of doing things.

With this design, JDO shouldn't have to care which internal implementation is 
used.

Revision History
----------------
 0.00 Initial release
 0.01 Fixed typos
 0.10 Added enum-value-type optional attribute
)

It occurred to me that this would add up to a lot of repeated code if the same enumeration was used in multiple classes or multiple fields in the same class. So that is my first question: how often in actual uses of castor would you reuse the same enumeration for multiple fields?

If that's common, I have an idea for a different way to map enumerations. Instead of loading all the information into the field element, it would involve creating a new top-level element, something like this:

<enum type="examples.Metasyn"
create-method="lookup"
value-method="getName"
<!-- value-direct="name" -->
value-type="string"
name="metasyn-string"/>

And then field would have only one new attribute (and it would be optional):

<field name="metasyn"
type="examples.Metasyn"
enum="metasyn-string">
<sql name="metasyn" type="varchar"/>
</field>

The definitions of enum's attributes should be mostly clear by comparison to the older, all-field proposal. The 'name' attribute would provide a way of discriminating between multiple enum mappings of the same class (e.g., it would allow one to map using string keys and another to use ints). If there were only one enum mapping, the class name would be sufficient to align the enum with any fields that used it.

To summarize, my questions for you folks are:
1) How often do you use the same typesafe enumeration class in multiple fields in JDO (or would you, if enumerations were supported)?
2) Is the <enum> element a reasonable way to accomplish this, or does it have some failing I'm not seeing? Which method would you prefer, regardless of reuse?

Thanks for your input.

Rhett Sutphin
=====================================================
| Rhett Sutphin
| Research Assistant (Software)
| Coordinated Laboratory for Computational Genomics
| and the Center for Macular Degeneration
| University of Iowa - Iowa City, IA 52242 - USA
| 4111 MEBRF - email: [EMAIL PROTECTED]
=====================================================

On Monday, November 18, 2002, at 11:06 AM, Rhett Sutphin wrote:
Hi,

I would like to add support for persisting java typesafe enumerations to Castor JDO. Since I'm going to be modifying Castor anyway, I'd like to try to add them in a reasonably generic manner so that my changes can be incorporated into the main framework.

This seems like a common-enough problem on this list, so I wrote a short design document about how it I think it should be done. Before I start implementing it, I would appreciate any comments that the list might have. (I'd particularly like to hear from the main Castor developers, of course, but all comments are welcome.)

Note that one solution has posted to the list before: http://castor.exolab.org/list-archive/msg18871.html . It is an admirable attempt, but I strikes me as a bit too much of a kluge -- it provides a quick-and-dirty solution to JDO's inability to handle Source Generator's handling of XSD enumerations, but relies too much on the existence of specific-but-unspecified methods in the SG enum classes. Plus it only handles one way of storing enumerations in the database (i.e., as strings).

Anyway, the design doc is attached. It covers the user-visible portion of the proposed support. Thanks for looking at it.

Rhett

Reply via email to