Re: Strange COUNT/SELECT behavior on AIX

Pawel Piaskowy Tue, 04 Sep 2012 07:47:39 -0700

Hi,

We were also suprised a bit by such behaviour. I think there are 3 factors 
related to your observation:
a) OS level (or SAN) caching
b) distributed file (no sure about JPlus files, but they are self-distributed? 
Is there anything like distribution routine?)
c) groups locks taken on selected (part)file


Below information applies to jBase 4 and it may or may not be relevant to jBase 
5 or JPlus files:
jBase doing full scan of file has to traverse file group by group. Process 
needs to take "group lock" on each block of file ("group") to avoid dirty read 
of information and synchronize all concurrently running processes (some could 
write and some could read to the same group at the same time).
In jBase 4 these group locks are taken using "native" (OS) locking mechanism - 
disregarding wheter you use jRLA or not (jRLA causes that record locks are 
taken using pthreads library and are actually maintained as semaphores in 
shared memory segment).

(To my knowledge) Group locks have nothing to do with record level locks and 
are short-lived, used when content of record is physically read or written 
(record locks  are kept as long as you keep lock in your program or jBase 
transaction is active).

I think that above description applies also to your, jBase 5 situation - none 
of the SELECT processes can finish before 1st SELECT is completed. 1st SELECT 
is a greedy process and once it finishes processing of group 1 it immediately 
releases group lock 1and immediately takes group lock of group 2. Some other 
SELECT process had to be very fast to process group 1 and take lock of group 2 
before 1st SELECT does it. It is theoretically possible, but practically 
impossible in my opinion.

Practically impossible because:
a) taking lock takes time (OS level locking does not perform well on all kind 
of architectures, eg. it is much slower than semaphores on IBM P7 servers) 
b) jBase needs to get IDs from locked group which (in case of poorly sized 
file) means it requires few more block reads. Of course information is usually 
already in OS/SAN cache because 1st SELECT process did this work. "Taking" IDs 
requires also few cycles.
c) (not 100% sure) In case of distributed files and selection criteria given 
SELECT needs to apply filtering criteria to each ID it finds. That means lot of 
extra work.

It seems to me also that currently jBase (at least 4.x) invokes (unnecessarily, 
unexplainably) distribution routine for each ID it sees. This is adding lot of 
overhead to the processing (even in cases when distribution routine is very 
simple). That was our observation if I remember well. 

The last thing that I would like to share my thoughs is general locking 
concept. Bad thing is that these group locks seem to be exclusive locks. In my 
opinion they theoretically could be shared read. However I may be wrong or miss 
something.

Finally: jBase team spotted some of the problems and introduced alternative 
method of group locking in jBase. Since jBase 4.1.5.37 semaphores may be used 
as group locking mechanism. That should perform much better on P7. Change was 
also prepared for jBase 5 I think. We are in the middle of testing - it looks 
very promising although we had one concern. 

By the way: which jBase, what kind of server are you running on? What criteria 
did you give to SELECT/COUNT? How jPlus files perform?

Group members: please correct me if I wrote something misleading or something 
that is not truth. 

Kind regards
Pawel

Wysłano z BlackBerry®

-----Original Message-----
From: DD <[email protected]>
Sender: [email protected]
Date: Tue, 4 Sep 2012 01:57:53 
To: <[email protected]>
Reply-To: [email protected]
Subject: Strange COUNT/SELECT behavior on AIX

Dear all,

We are migrating our application (T24) on jBASE from HP-UX to AIX. We are 
experiencing some strange behavior of SELECT (and COUNT) on large files 
(JPlus, >2GB). Apart from being slower on AIX (which may have to do with 
SAN), there is something else that I cannot explain.

If I start the same COUNT from different ssh sessions, all finish at 
exactly the same time:

jsh t24adm ~ -->date
Tue Aug 21 15:23:59 METDST 2012
jsh t24adm ~ -->COUNT FBNK.RE.CONSOL.SPEC.ENTRY

12482387 Records counted

jsh t24adm ~ -->date
Tue Aug 21 *17:46:16* METDST 2012

The above process took almost 2,5 hours. The below one I started much later 
on the same environment from another session, but they came back at the 
very same second:

jsh t24adm ~ -->time COUNT FBNK.RE.CONSOL.SPEC.ENTRY

12482403 Records counted

usr: 11.51   sys: 14.82   elapsed: 47m33.02s
jsh t24adm ~ -->date
Tue Aug 21 *17:46:16* METDST 2012

This is not a coincidence, I saw it many times.
Any ideas?

-- 
-- 
IMPORTANT: T24/Globus posts are no longer accepted on this forum.

To post, send email to [email protected]
To unsubscribe, send email to [email protected]
For more options, visit this group at http://groups.google.com/group/jBASE?hl=en




-- 
-- 
IMPORTANT: T24/Globus posts are no longer accepted on this forum.

To post, send email to [email protected]
To unsubscribe, send email to [email protected]
For more options, visit this group at http://groups.google.com/group/jBASE?hl=en

Re: Strange COUNT/SELECT behavior on AIX

Reply via email to