Below as quoted by Mamta are my views on this.  I was hoping that
compares involving collation chars would not require twice the
number of objects being created.

Below describes how store uses InstanceGetter currently to optimize
allocation of objects.  I was hoping to preserve this performance
for current non-collation datatypes and also to avoid needing to
provide any additional collation information after the initial
dvf.instanceGetterFromIdentifiers call.

Just so I know where we are, Dan do you have a problem with the
proposed interfaces, ie. are they in the right place and taking
the right arguments?  If so maybe we could incrementally implement
the interfaces so that I could continue the store side while the
implmentation discussion continues.  I would be ok with an initial
interface change that only supported current collation, so that
I could at least verify the store changes.

Mamta, are you close to an implmentation, maybe you could post a patch so that I could work off of that while discussion continues?

Mamta Satoor wrote:
Hi Dan,
Here are my attempts to answers your questions. "Why use InstanceGetter here?" Because Store wants to call the InstanceGetter once and call getInstance on them multiple times. This is for efficiency reasons. This is what is currently done but through interfaces on Monitor rather than DVF. Mike, maybe you can share your thoughts too on why Store does this.

"It doesn't have to return another DVD, it can return itself if it is of
the correct type, thus no additional overhead for UCS_BASIC collation.
Thus this switch would happen once for the first collation, not every
collation, and of course not happen at all if no collation is involved."
I agree, but with InstanceGetter approach, it doesn't even have to happen once because we will be generating the right DVD in first place. "Could you show an example of how the store will be calling the code you
are describing? Maybe that would help me out."
Store would call something like following(this is copied from what Mike wrote in this same thread, dated April 12th, 2nd mail from Mike, point 3.) Again, Mike if you have more to add from the Store point of view, please do so.

   Store will call following once
InstanceGetter = dvf.instanceGetterFromIdentifiers(format id, collation id)

   Store will call following many times:
   dvd = InstanceGetter.getNewInstance()
The reason for doing it this way is explained by Mike below "3) optimized allocation, caching some of the work. This is used
   where one query may generate large number of rows - for instance
   hash table scan and sorter calls.  Here the idea is to do some
   part of the work once leaving an InstanceGetter which then can
   repeatedly give back new objects in the most optimized way:

again at this point dvd can be used to correctly compare against other
     dvd's in possible collate specific ways."

thanks,
Mamta
On 4/14/07, *Daniel John Debrunner* <[EMAIL PROTECTED] <mailto:[EMAIL PROTECTED]>> wrote:

    Mamta Satoor wrote:
     > Hi Dan,
     >
     > The problem we are trying to solve is provide a way to Store so
    that it
     > can call a method (say it's called
     > getInstanceGetterForFormatIDandCollationType) on DVF with format id &
     > collation type and get an InstanceGetter for that combination.

    Why use InstanceGetter here?

     > Like Mike
     > mentioned in his earlier mail (in this same thread, dated April 12th,
     > 2nd mail from Mike) with point 3), Store will call this method
    once and
     > call getInstance on that InstanceGetter multiple times to get the
    right
     > DVD. If we don't change the InstanceGetter as I suggested, then that
     > would mean that we will be creating 2 DVD objects for every character
     > DVD through Store code. The worst part is we will be doing this
     > unnecessary creation of 2 DVDs even for databases which want default
     > collation. The 2 DVD creation I am talking about are first, through
     > InstanceGetter, we will get say SQLChar. Then at the time of actual
     > collation comparison, it will have to call something like
     > StringDataValue.getCollationValue(int collationType) to get
    another DVD
     > to make sure that the collation is being performed with write DVD.

    It doesn't have to return another DVD, it can return itself if it is of
    the correct type, thus no additional overhead for UCS_BASIC collation.
    Thus this switch would happen once for the first collation, not every
    collation, and of course not happen at all if no collation is involved.

     > What I am suggesting does not make InstanceGetter complicated. It is
     > pretty simple implementation. All I am proposing is to have special
     > InstanceGetter class for collation sensitive DVDs. This new
     > InstanceGetter class will have RuleBasedCollator (which will be
    set the
     > first time this InstanceGetter is created for the given database
    through
     > the DVF) and it will have collation type(this collation type will
    always
     > be set to whatever collation type the
     > getInstanceGetterForFormatIDandCollationType was called with. This
     > collation type will determine which kind of DVD to generate ie
    one with
     > default collation or one with terriotry based collation). You
    mentioned
     > in your mail that "I got a little lost in the details". Please let me
     > know where it was unclear and I can try to explain it better.

    Could you show an example of how the store will be calling the code you
    are describing? Maybe that would help me out.

     >
     > As for your question about "does it take account of the fact that
    the
     > registered format ids are system wide and there can be databases with
     > different default collations in the same system?" My understanding is
     > that there is one DVF per database and these InstanceGetters will be
     > saved on DVF and hence I do not forsee any problems in having
    multiple
     > databases with different collations in same Derby system.

    Dan.



Reply via email to