I'm currently developing a system that uses a user-defined set of rules constructed
from a fixed set of conditions and actions to define business rules. A significant
portion of these conditions involve reasonably complex SQL queries to fetch the
relevent data against a database that will likely exceed 50GB. This is run on a batch
basis, after a load from an external system.
For each object in the DB (just under a million of them) the system is using Jess as a
sort of sieve, running each object through by generating a set of ordered facts based
on the conditions desired, generating the first part of the fact based on the conition
name a parameters and executing the query in question with the conditions specified by
the user to get the second item in the ordered fact:
The user defines:
if response time of transition at temp 50 > response time of trans at temp 10 * 1.5
then transition is high temperature sensitive
the user was able to specify the 50, 10 and * 1.5 as well as select the conditions
themselves arbitrarily from a fixed set (they could have been comparing the response
time the the cost in cents if they wanted to...)
the queries would be:
select max(response_time) where transition_id = 123 and temperature > 50 and ..
select max(response_time) where transition_id = 123 and temperature < 10 and ...
would get placed into facts:
(reponse-trans-temp-50 55000)
(reponse-trans-temp-10 15000)
( defrule Rule-Sample-Rule
( reponse-trans-temp-50 ?vreponse-trans-temp-50 )
( reponse-trans-temp-10 ?vreponse-trans-temp-10 )
( test ( > ?vreponse-trans-temp-50 (* 1.5 ?vreponse-trans-temp-10)))
= > ( assert ( temp-sensitive 1.0 ) )
)
Currently the system executing the queries just before putting the facts into jess,
which has proved too slow. I was wondering if anyone out there has used jess in a
similar manner and faced the same sort of bottleneck I'm experiencing. I'm about to
implement an elaborate system of temporary table generation based on the rules
specified in the rule-base by generating SQL something like:
insert into COND_01 (TRANSITION_ID, RESPONSE_10)
select TRANSITION_ID,
MAX(RESPONSE_TIME)
FROM TRANSITION, SAMPLE
WHERE ..;
I'm expecting (ahh.. maybe guessing) that I won't see more than 50 conditions that
justify this sort of treatment, i.e. 50 tables for buffering data.
I'm wondering if I've missed an obvious error in knowledge representation that has
imposed this problem on me or, if, as I suspect, this is a common problem and has been
solved in a better way than generating temporary tables on a condition-by-condition
basis that has the potiential to baloon into an data volume scaleablility problem as
bad or worse than my roundtrip/query response-time problem (although I do have plenty
of disk space at this point). Any suggestions out there? And no, a hardware upgrade is
not in the cards :(
Thanks,
Jamison Masse
--------------------------------------------------------------------
To unsubscribe, send the words 'unsubscribe jess-users [EMAIL PROTECTED]'
in the BODY of a message to [EMAIL PROTECTED], NOT to the list
(use your own address!) List problems? Notify [EMAIL PROTECTED]
--------------------------------------------------------------------