An update, it turns out that so far the performance problem is not with Derby 
but with the queries that JPA is generating.  I am still going to finish 
separating out the performance from the provisioning and see if there is any 
performance gains to be had and will report back then.

I just wanted to not let this hang around with the implication that Derby is 
not performing when in fact it is the application that is not performing.  You 
know how these emails are archived and live forever ;)

Brett

-----Original Message-----
From: Bergquist, Brett [mailto:[email protected]] 
Sent: Wednesday, March 21, 2012 10:12 PM
To: [email protected]
Subject: RE: Question on whether to use multiple databases/multiple network 
servers or not

Thanks for the response Mike.

The JVM is Java 1.6.0

Currently configured, there are 16 worker threads that are processing 
performance data from the network devices.  These threads are not busy waiting 
for data but rather are pooled and dispatched when there is processing to 
perform.  Playing around with the number of processing threads led to 16 being 
a good number for the current architecture.  I do realize that possibly 
batching the updates to the disk and maybe using one thread to perform the 
database insertions might be better but there is a wrinkle in that before the 
software can respond to the network device that it has received the data, it 
must be ensured that the data is written to stable storage.   So batching up 
the inserts does provide some problems.

A separate thread pool managed by the application server is used to process 
HTTP requests which in turn hit EJB's that perform the provisioning processing. 
 This thread pool is configured for 128 threads right now, but from monitoring 
I have not seen more than 20 in use at one instant.  The request processing is 
synchronous.

The provisioning requests use completely separate tables that the performance 
data.  Only some custom reports actually do some cross table queries and these 
are only run once in a great while.   So there should not be any 
provisioning/performance data table contention.

The database is stored on 2 300Gb small form factor serial drives on a ZFS pool 
that are mirrored.  Right now the log is on the same disk.  So moving the log 
to a separate ZFS pool on the other two mirrored drives is possible and I will 
investigate that.  

There are query/update/delete operations being done by the provisioning 
requests.   Basically read the current data, apply changes, and update is 
usually the processing.

The performance data does two indexed lookups and one insert per performance 
data record.  The indexed lookups are in totally separate tables from the 
provisioning data.   I have implemented a poor man's partitioning by having a 
table per week of the year for the performance data and purging of data is done 
once a week by truncating the tables (where the data is no longer needed).  
This is working great in not trying to insert and delete from the same table.

I am in the process of trying to separate the provisioning and performance into 
two separate databases.  This looks to be pretty straight forward and with 
surprisingly just a few changes.   Simulating this environment is a little more 
challenging as there are about 45K network devices being managed with about 
1600 providing the performance data.   So the provisioning portion of the 
database has records for about 45K devices times multiple copies of the this 
configuration data (backup data to go to a previous know state).   Having 
available 45K real devices is not possible so simulation firmware is run on the 
devices to make 1 device look like 1000.   Similarly, having 1600 real devices 
providing data is done through hardware that simulates multiple devices.

Over the next couple of days I should be able to get this up and running in a 
test environment.

________________________________________
From: Mike Matrigali [[email protected]]
Sent: Wednesday, March 21, 2012 8:25 PM
To: [email protected]
Subject: Re: Question on whether to use multiple databases/multiple network 
servers or not

Bergquist, Brett wrote:
> We have a system in production that has the following characteristics:
>
> *         Provisioning data for various network devices.  There is a
> large number of tables and rows for each device by the data changes 
> infrequently but changes have to be responsive as this data is 
> accessed by a human with a user interface
>
> *         Performance data that is being inserted into the database at
> about 6.5M records per day.  Queries are also done on this data for 15 
> minute intervals and also every 4 hours.  The inserts are non-stop and 
> the queries are periodic.  The inserts need to be responsive as this 
> data is being generated by network devices and needs to keep up.
>
> *         There is one database that contains both kinds of data.
>
>
>
> We are running into a performance problem particularly with provisioning
> data.   Without the performance data being inserted, the provisioning
> changes are performing okay but these are affected a great deal when 
> the performance data is being inserted at such a high rate.

is the provisioning problem with read only transactions, or are they write 
also?  My interpretation is that all tables used by provisioning are different 
than those being inserted by performance.
>
>
>
> There are enough connections to the database engine.
>
>
>
> The system is an Oracle M5000 with 32 processors and 32Gb of memory.
> Looking at CPU utilization and the system is about 10% utilized.    It
> appears that the system is not I/O bound as of yet.
can you talk to number of threads provisioning and performance are using.  
Derby will not do much to break up a single connections work across multiple 
threads.  So a single inserter may be cpu bound but looking at the machine it 
will only be 1/32 utilized, and derby will not go faster for that connection.  
In general derby does a good job of making each incoming connection a different 
thread and running as many of them in parallel as possible as long as there is 
not database lock contention.

What is the disk situation?  One disk, multiple disks, maybe multiple disks 
presented as a single disk?  When talking about moving to multiple dbs, it 
would likely make a lot of sense if you could spread the i/o to multiple disks 
putting one db on each.  Better is if the OS just handles this by presenting 
multiple disks as one.
>
>
>
> I was wondering if it would make sense to separate out the performance 
> data into its own database and potentially its own JVM though a second
> Network Service running.    This will lead to some complexities when
> trying to correlate the performance data back with the provisioning data
> when needed (now to separate databases).    I was wondering if there is
> any thoughts on if this might help separate any contention in the 
> single database that might exist and allow better performance for the 
> provisioning information
>
Any chance you can prototype on a test machine easily first to verify it 
removes the bottleneck.

some bottlenecks in a single database include:
1 log file, so writes are somewhat blocked by other writes, but there is
      software to do "group" commit to optimize this.
1 background processing thread, so background work can become bottle neck, this 
is mostly an issue if you are doing lots of deletes or updates of key fields in 
indexes.

1 disk for data per database, and optional second disk for log.  This is an 
obvious bottleneck if you config has multiple disks that derby is not using.

There are other shared caches per database that are shared but i would be 
surprised if throughput would be affected with such available cpu:
query cache, database cache, open file cache, jvm garbage collector, ...
>
>
> I almost would like it to be that the provisioning database access 
> have higher priority than the performance data since it is infrequent 
> but needs to be responsive.
>
>
>
> Any thought would be greatly appreciated.
>
>
>
> Brett
>





Reply via email to