Re: Implementing using CMP quickly runs in to limitation...

Jonathan K. Weedon Thu, 06 Apr 2000 17:20:03 -0700
Alexander,

<vendor>

You bring up some excellent points.  My responses are follow.

> I hope the guys at Inprise did serious performance testing before
> implementing the solution for updates that you described. In my view using
> reflection for generating updates only for changed fields has following
> drawbacks:

Yes, we took the time to verify that out performance optimizations did
in fact improve system throughput.  In fact, we implemented a fairly
functional online book store application (based on TPC-W) over a span
of six months, and used this application to tune the various
optimizations.  During this process, some optimizations were added,
and some were dropped, and we were careful to measure the various
changes in throughput as we made the changes.  A writeup on the final
performance numbers can be obtained from Inprise Sales.  (We were
providing the writeup publicly, but had to remove it since it made
reference to TPC-W, and was not properly audited).

In terms of your particular concerns:

> .- You are going to get a lot of different SQL statements which means that
> database server would have to come up with the execution plan for each of
> them.

In reality, this is not true.  In the real world, applications have a
"locality of behavior", in that it is common that a large number of
transactions do the same sorts of things.  This means that some fields
will tend to be modified all the time, whereas other fields will never
be modified.  Thus, in practice, there is a small number of update
statements that are issued to the database frequently; far fewer than
the theoretic n! where n is the number of fields.  In practice, there
are probably only a handful of different updates that ever occur with
any frequency, which should overwhelm any real-world databases.

> .- Running comparison through reflection against all fields in EB can be
> expensive.

There are two parts to reflection, one is expensive, and the other is
not.  The actual "reflecting" of the class, to obtain its
java.lang.reflect.Fields, etc., is very expensive.  The act of
"accessing" an object using a precomputed java.lang.reflect.Field is
quite fast.  In code, the slow operation is:

        java.lang.reflect.Field field = object.getClass().getField("theField");

the fast operation is:

        int i = field.getInt(object);

or:

        field.setInt(object, i);

These two are both native C methods which are highly optimized.

So, the act of getting and setting the container-managed fields using
reflection is negligible, so long as you precompute the reflection of
the fields, which is exactly what we do.

The other performance issue has to do with the cost of comparing the
fields with their "before" states.  Here, let's keep in mind the
domain: at least with our CMP engine, as with most of the competing
products, we are concerned only with mapping CMP fields to a
relational database.  Yes, if I have a really complex graph of
objects, this may be expensive to compare against a previous copy, but
I don't generally have such a beast in my Oracle database (at least
not directly, I may have it there stored as a serialized Java object,
e.g, as a BLOB or RAW).  Thus, in the database, I tend to have
strings, numbers, dates, and maybe some BLOBs.

For strings, numbers and dates, hopefully we agree that the cost of
comparing objects is negligible.  The only outstanding issue is BLOBs.
But here, you have to realize that if I am going to get a BLOB in or
out of the database, I need to have a Java byte[] in my hand first.
Well, if I have a Java byte[] in my hand, then comparing it to its
before copy is pretty efficient, at least with a good JIT.  The
algorithm is:

        public boolean isModified(Object oldValue, Object newValue) {
          if(oldValue == newValue) {
            return false;
          }
          if(oldValue == null ||
             newValue == null) {
            return true;
          }
          byte[] a = (byte[]) oldValue;
          byte[] b = (byte[]) newValue;
          if(a.length != b.length) {
            return true;
          }
          for(int i = 0; i < a.length; i++) {
            if(a[i] != b[i]) {
              return true;
            }
          }
          return false;
        }

(Someone might suggest to Sun that adding a memcmp operation would be
helpful, but the above code should compile pretty efficiently
regardless.)

> .- From my experience update of the individual field has negligible overhead
> (as long as it is not the PK), so it should not make any difference whether
> you update 5 or 10 fields.
>  .- Some DBMSs might be able to do this kind of optimization in the database
> and not to update the field (or, more importantly,  its index) if it has not
> been changed. This will work  much faster than doing the same thing in Java
> using reflection.

Here, you need to think about not only the cost of the work being
done, but which tier is doing the work.  If I have to send bigger TCP
packets to the database, and it needs to marshal bigger SQL requests,
then both the network infrastructure and the database machine have
more work to do.

The problem with this approach is that neither the network
infrastructure nor the database machine can be scaled up very far.  On
the other hand, replicating an AppServer in the middle-tier is trivial
(at least with any competent AppServer implementation).  So, I can
have 2 or 5 or 50 AppServers machines sitting in the middle tier, all
accessing a small number of databases over a single network.  Here,
clearly if I can move work out of the databases and out of the network
and into one of my 2, 5, or 50 AppServers machines, I have a huge
improvement in throughput.

Hence, our architecture.

</vendor>

-jkw

"Ananiev, Alexander" wrote:
>
> Hi Louth,
>
> I hope the guys at Inprise did serious performance testing before
> implementing the solution for updates that you described. In my view using
> reflection for generating updates only for changed fields has following
> drawbacks:
> .- You are going to get a lot of different SQL statements which means that
> database server would have to come up with the execution plan for each of
> them. So you won't benefit from the statement (and execution plan) caching
> performed by the DBMS. When you update all fields DBMS uses the same
> statement and same execution plan all the time which is faster.
> .- Running comparison through reflection against all fields in EB can be
> expensive.
> .- From my experience update of the individual field has negligible overhead
> (as long as it is not the PK), so it should not make any difference whether
> you update 5 or 10 fields.
>  .- Some DBMSs might be able to do this kind of optimization in the database
> and not to update the field (or, more importantly,  its index) if it has not
> been changed. This will work  much faster than doing the same thing in Java
> using reflection.
>
> I was facing the same issue when implementing persistence framework for one
> of our clients (not in Java though), and after some testing we realized that
> object flagging and all fields update is much faster than field by field
> comparison.
>
> Just some thoughts...
>
> Alexander "Sasha" Ananiev
> PricewaterhouseCoopers

===========================================================================
To unsubscribe, send email to [EMAIL PROTECTED] and include in the body
of the message "signoff EJB-INTEREST".  For general help, send email to
[EMAIL PROTECTED] and include in the body of the message "help".
Re: Implementing using CMP quickly runs in to limitation...

Reply via email to