Re: Collation implementation WAS Re: Should COLLATION attribute related code go in BasicDatabase?

Mike Matrigali Wed, 21 Mar 2007 09:55:58 -0800


Mamta Satoor wrote:

2)At the time of upgrade of pre-10.3 database, we should make sure thatderby.database.collation property with value UCS_BASIC in added toservices.properties. This is because we do not plan on supportingcollation change for existing databases.

Is this required? How does the code handle a soft upgrade databasewhere this property is not set? Could you say what you plan to do

in both the hard and soft upgrade cases?

I was assuming that only new databases would be affected and thatsomehow new code would just work on existing databases with no upgrade

changes at all.  So something like no collation property at all
would be interpreted as UCS_BASIC.  And of course old format SYSCOLUMN
entries would be valid as well as old format conglomerate store metadata.

On 3/20/07, *Mamta Satoor* <[EMAIL PROTECTED]<mailto:[EMAIL PROTECTED]>> wrote:


    Thanks, Mike and Dan for your responses. Based on this and following
    from Dan's first mail in this thread
    ******start of part of Dan's first mail in this thread*******
    - basic database sets the locale for the DataValueFactory after it
    boots it, using a new method on DVF
            void setLocale(Locale locale);
    ******end of part of Dan's first mail in this thread*******

I may have missed this, is locale information already available from
from services.properties?  For the store boot issue store will provide
format id and collation id, but I believe you need locale information
to determine the RuleBasedCollator and it can't depend on anything in
the property conglomerate.

    we donot need the collation attribute information at the DVF boot
    time. It is sufficient to have locale info set on DVF at the boot
    time using setLocale method by basic database. If store code calls
    DVF to give proper DVD using formatid and collation type, DVF can
    determine the correct RuleBasedCollator using the locale if the
    collation type is territory based. So, DVF has everything it needs
    to find the correct RuleBasedCollator for given collation type.

I will go ahead and remove the following requirement from

    Outstanding items under
    http://wiki.apache.org/db-derby/BuiltInLanguageBasedOrderingDERBY-1478
    <http://wiki.apache.org/db-derby/BuiltInLanguageBasedOrderingDERBY-1478>
    1)Add jdbc url attribute COLLATION into services.properties as
    derby.database.collation property. If no COLLATION is specified at
    database create time, then have UCS_BASIC as the value for
    derby.database.collation We need the property in the
    services.properties rather than properties conglomerate because
    DataValueFactory <http://wiki.apache.org/db-derby/DataValueFactory>
    needs this property before store has been booted completely.

In addition, I will add an entry as follows under Implemented Items

    on
    http://wiki.apache.org/db-derby/BuiltInLanguageBasedOrderingDERBY-1478
    <http://wiki.apache.org/db-derby/BuiltInLanguageBasedOrderingDERBY-1478>
    At the time of database create time, optional JDBC url attribute
    COLLATION is validated by the boot code in data dictionary and the
    validated value of COLLATION(if none specified by user, then it will
    default to UCS_BASIC which is also the only collation available on
    pre-10.3 databases) attribute is saved as derby.database.collation
    property in the properties conglomerate. This work was done by
    revision 511283

As always, any feedback is welcomed,

    Mamta

On 3/20/07, *Mike Matrigali* <[EMAIL PROTECTED]

    <mailto:[EMAIL PROTECTED]>> wrote:



        Mamta Satoor wrote:

 Mike, I am not sure if your question, about how in store DVD with
 correction collation type is loaded, was answered or not. In

        other

 words, you had question about following piece of pseudo code

        from Dan

     if (dvd instanceof StringDataValue)
             dvd = dvd.getValue(dvf.getCharacterCollator(type));

 Let me attempt to answer it. It will help clear up things in

        my mind too

 and make sure that I am understanding this correctly.

 Currently,

        derby.impl.dtore.access.conglomerate.OpenConglomerateScratchSpace
        has

 get_row_for_export which first gets a class template row using
 RowUtil.newClassInfoTemplate This method in RowUtil calls
 Monitor.classFromIdentifier to get the InstanceGetter for each

        of the

 format ids identified by store. Once
 OpenConglomerateScratchSpace.get_row_for_export has the class

        template

 row, it will call RowUtil.newRowFromClassInfoTemplate. This is the
 method, Dan is proposing to modify, ie store should pass an

        additional

 array of int to  RowUtil.newRowFromClassInfoTemplate which

        will have the

 collation type associated with the formatids of the template row.
 RowUtil.newRowFromClassInfoTemplate will first get the DVD as

        it does

 today using following
                    columns[column_index] =
 (DataValueDescriptor)

        classinfo_template[column_index].getNewInstance();

 In addition, it will need to do something like following
     if (columns[column_index] instanceof StringDataValue)
             dvd =

        
columns[column_index].getValue(dvf.getCharacterCollator(collationTypesForTemplateRows[column_index]));

        My opinion is that this work should be done in the datavalue
        factory and
        not outside.  Dan suggested at one point that some of the work of
        generating classes/instances should move from Monitor to
        datavalue factory.

        So I was assuming something like RowUtil.newClassInfoTemplate
        instead
        of calling Monitor.classFromIdentifier(format_ids[i]) get an
        array of
        InstanceGetter's, it would call something like
        datavaluefactory.classFromIdentifier(format_ids[i],
        collator_ids[i]) -
        then every InstanceGetter would produce the right type with
        collator set
        from then on.


        Internal to dvf it can do the work of checking for instanceof if it
        needs to, but because it is inside dvf maybe it can do something
        smarter .


 Dan, let me know if I understood you right. This will help me

        answer

 your question on the Derby wiki page

        http://wiki.apache.org/db-derby/BuiltInLanguageBasedOrderingDERBY-1478
        <http://wiki.apache.org/db-derby/BuiltInLanguageBasedOrderingDERBY-1478>

        http://wiki.apache.org/db-derby/BuiltInLanguageBasedOrderingDERBY-1478
        
<http://wiki.apache.org/db-derby/BuiltInLanguageBasedOrderingDERBY-1478>>
        I

 know that we don't need to get into the implementation code

        details in

 the design phase, but I need to be able to picture this

        particular case

 in my mind to understand where I am going.

 thanks,
 Mamta


 On 3/15/07, *Mike Matrigali* < [EMAIL PROTECTED]

        <mailto:[EMAIL PROTECTED]>

 <mailto:[EMAIL PROTECTED]

        <mailto:[EMAIL PROTECTED]>>> wrote:




    Daniel John Debrunner wrote:
     > Mamta Satoor wrote:
     >
    ...

     >
     > - At recovery time the btree uses the collation type and

        the data

    value
     > factory to setup its template row array correctly.

        Something like

     >      for each dvd in row array
     >         if (dvd instanceof StringDataValue)
     >              dvd = dvd.getValue(dvf.getCharacterCollator

        (type));


    Note that the store issue is not just a recovery time

        issue, templates

    are required during normal runtime.  Creation of these

        templates used

    to show up (a long time ago) in performance analysis and

        work was done

    to optimize the performance.  So I am interested in making

        these

    template creations as efficient as possible.

    Your proposal above does not look right to me - it could

        just be I don't

    understand where the psuedo code is.  The code I expect in

        store would

    be something like below - letting the datafactory do

        whatever is right

    based on the format id and the collation, if store is going

        to "own"

    knowing
    the collation of a given column then I would expect

        something like:


    for each format id in row array
        dvd = datavaluefactory.getObject(format id,

        character_collator_type)


    note this means extra overhead for every object creation in

the

    template.

    To me it seems unfortunate to pass in this info per column,

        when at

    least in 10.3 the current code it is one per database.  I

        saw the

    direction as:

    o 10.3 only needs one collation per database so hide the

        info in the

      datafactory, basically there is one DEFAULT collation per

        database.

      Thus no need for second argument to

        datavaluefactory.getObject ()


    o future release needs to have different collations per

        conglomerate,

      then at that time we can store a collator type per

        conglomerate - we

      have mechanism today to upgrade on the fly.  If we want

        to support

      adding a collation to an existing database I would

        suggest continueing

      the DEFAULT collation concept with some magic number

        representing

      DEFAULT db collation in the datavaluefactory.getObject ()

        call - which

      would mean use db wide default rather than specify

        specific one. For

      new databases we would not need default, we could at that

        time

    specify
      one per conglomerate.
      At this point we either change all the

        datavaluefactory.getObject()

      calls to have 2 args and support DEFAULT_VALUE as second

        argument, or

      maybe support both 1 and 2 arg calls - not sure.

    0 future future release needs to have different collations

        per column,

      then at that time we can store a collator type per column

        - we

    continue to have mechanism to upgrade on fly as long as we

        can come up

    with a default value for old tables.  Same issues as above.



     >
     > - setting the collation property remains in the data

        dictionary

     >
     > - basic database sets the locale for the

        DataValueFactory after

    it boots
     > it, using a new method on DVF
     >         void setLocale(Locale locale);
     >
     > I think approaching the problem this way will lead to a

        cleaner

    solution
     > in the long term and be somewhat easier to implement.
     >
     > Thanks,
     > Dan.
     >
     >
     >
     >
     >
     >

Re: Collation implementation WAS Re: Should COLLATION attribute related code go in BasicDatabase?

Reply via email to