Hi,
We were also suprised a bit by such behaviour. I think there are 3 factors
related to your observation:
a) OS level (or SAN) caching
b) distributed file (no sure about JPlus files, but they are self-distributed?
Is there anything like distribution routine?)
c) groups locks taken on selected (part)file
Below information applies to jBase 4 and it may or may not be relevant to jBase
5 or JPlus files:
jBase doing full scan of file has to traverse file group by group. Process
needs to take "group lock" on each block of file ("group") to avoid dirty read
of information and synchronize all concurrently running processes (some could
write and some could read to the same group at the same time).
In jBase 4 these group locks are taken using "native" (OS) locking mechanism -
disregarding wheter you use jRLA or not (jRLA causes that record locks are
taken using pthreads library and are actually maintained as semaphores in
shared memory segment).
(To my knowledge) Group locks have nothing to do with record level locks and
are short-lived, used when content of record is physically read or written
(record locks are kept as long as you keep lock in your program or jBase
transaction is active).
I think that above description applies also to your, jBase 5 situation - none
of the SELECT processes can finish before 1st SELECT is completed. 1st SELECT
is a greedy process and once it finishes processing of group 1 it immediately
releases group lock 1and immediately takes group lock of group 2. Some other
SELECT process had to be very fast to process group 1 and take lock of group 2
before 1st SELECT does it. It is theoretically possible, but practically
impossible in my opinion.
Practically impossible because:
a) taking lock takes time (OS level locking does not perform well on all kind
of architectures, eg. it is much slower than semaphores on IBM P7 servers)
b) jBase needs to get IDs from locked group which (in case of poorly sized
file) means it requires few more block reads. Of course information is usually
already in OS/SAN cache because 1st SELECT process did this work. "Taking" IDs
requires also few cycles.
c) (not 100% sure) In case of distributed files and selection criteria given
SELECT needs to apply filtering criteria to each ID it finds. That means lot of
extra work.
It seems to me also that currently jBase (at least 4.x) invokes (unnecessarily,
unexplainably) distribution routine for each ID it sees. This is adding lot of
overhead to the processing (even in cases when distribution routine is very
simple). That was our observation if I remember well.
The last thing that I would like to share my thoughs is general locking
concept. Bad thing is that these group locks seem to be exclusive locks. In my
opinion they theoretically could be shared read. However I may be wrong or miss
something.
Finally: jBase team spotted some of the problems and introduced alternative
method of group locking in jBase. Since jBase 4.1.5.37 semaphores may be used
as group locking mechanism. That should perform much better on P7. Change was
also prepared for jBase 5 I think. We are in the middle of testing - it looks
very promising although we had one concern.
By the way: which jBase, what kind of server are you running on? What criteria
did you give to SELECT/COUNT? How jPlus files perform?
Group members: please correct me if I wrote something misleading or something
that is not truth.
Kind regards
Pawel
Wysłano z BlackBerry®
-----Original Message-----
From: DD <[email protected]>
Sender: [email protected]
Date: Tue, 4 Sep 2012 01:57:53
To: <[email protected]>
Reply-To: [email protected]
Subject: Strange COUNT/SELECT behavior on AIX
Dear all,
We are migrating our application (T24) on jBASE from HP-UX to AIX. We are
experiencing some strange behavior of SELECT (and COUNT) on large files
(JPlus, >2GB). Apart from being slower on AIX (which may have to do with
SAN), there is something else that I cannot explain.
If I start the same COUNT from different ssh sessions, all finish at
exactly the same time:
jsh t24adm ~ -->date
Tue Aug 21 15:23:59 METDST 2012
jsh t24adm ~ -->COUNT FBNK.RE.CONSOL.SPEC.ENTRY
12482387 Records counted
jsh t24adm ~ -->date
Tue Aug 21 *17:46:16* METDST 2012
The above process took almost 2,5 hours. The below one I started much later
on the same environment from another session, but they came back at the
very same second:
jsh t24adm ~ -->time COUNT FBNK.RE.CONSOL.SPEC.ENTRY
12482403 Records counted
usr: 11.51 sys: 14.82 elapsed: 47m33.02s
jsh t24adm ~ -->date
Tue Aug 21 *17:46:16* METDST 2012
This is not a coincidence, I saw it many times.
Any ideas?
--
--
IMPORTANT: T24/Globus posts are no longer accepted on this forum.
To post, send email to [email protected]
To unsubscribe, send email to [email protected]
For more options, visit this group at http://groups.google.com/group/jBASE?hl=en
--
--
IMPORTANT: T24/Globus posts are no longer accepted on this forum.
To post, send email to [email protected]
To unsubscribe, send email to [email protected]
For more options, visit this group at http://groups.google.com/group/jBASE?hl=en