JESS: Optimizing Jess-driven DB queries and large datasets

Jamison Masse Mon, 10 Nov 2003 05:36:15 -0800

I'm currently developing a system that uses a user-defined set of rules constructed 
from a fixed set of conditions and actions to define business rules. A significant 
portion of these conditions involve reasonably complex SQL queries to fetch the 
relevent data against a database that will likely exceed 50GB. This is run on a batch 
basis, after a load from an external system.


For each object in the DB (just under a million of them) the system is using Jess as a 
sort of sieve, running each object through by generating a set of ordered facts based 
on the conditions desired, generating the first part of the fact based on the conition 
name a parameters and executing the query in question with the conditions specified by 
the user to get the second item in the ordered fact:

The user defines:
if response time of transition at temp 50 > response time of trans at temp 10 * 1.5
then transition is high temperature sensitive

the user was able to specify the 50, 10 and * 1.5 as well as select the conditions 
themselves arbitrarily from a fixed set (they could have been comparing the response 
time the the cost in cents if they wanted to...)

the queries would be:
select max(response_time) where transition_id = 123 and temperature > 50 and ..
select max(response_time) where transition_id = 123 and temperature < 10 and ...

would get placed into facts:
(reponse-trans-temp-50 55000)
(reponse-trans-temp-10 15000) 

( defrule Rule-Sample-Rule
  ( reponse-trans-temp-50 ?vreponse-trans-temp-50 ) 
  ( reponse-trans-temp-10 ?vreponse-trans-temp-10 ) 
  ( test ( > ?vreponse-trans-temp-50 (* 1.5 ?vreponse-trans-temp-10)))
 = > ( assert ( temp-sensitive 1.0 ) ) 
)

Currently the system executing the queries just before putting the facts into jess, 
which has proved too slow. I was wondering if anyone out there has used jess in a 
similar manner and faced the same sort of bottleneck I'm experiencing. I'm about to 
implement an elaborate system of temporary table generation based on the rules 
specified in the rule-base by generating SQL something like:

insert into COND_01 (TRANSITION_ID, RESPONSE_10) 
  select TRANSITION_ID, 
  MAX(RESPONSE_TIME) 
    FROM TRANSITION, SAMPLE 
    WHERE ..;
I'm expecting (ahh.. maybe guessing) that I won't see more than 50 conditions that 
justify this sort of treatment, i.e. 50 tables for buffering data.

 I'm wondering if I've missed an obvious error in knowledge representation that has 
imposed this problem on me or, if, as I suspect, this is a common problem and has been 
solved in a better way than generating temporary tables on a condition-by-condition 
basis that has the potiential to baloon into an data volume scaleablility problem as 
bad or worse than my roundtrip/query response-time problem (although I do have plenty 
of disk space at this point). Any suggestions out there? And no, a hardware upgrade is 
not in the cards :(

Thanks,
Jamison Masse

--------------------------------------------------------------------
To unsubscribe, send the words 'unsubscribe jess-users [EMAIL PROTECTED]'
in the BODY of a message to [EMAIL PROTECTED], NOT to the list
(use your own address!) List problems? Notify [EMAIL PROTECTED]
--------------------------------------------------------------------

JESS: Optimizing Jess-driven DB queries and large datasets

Reply via email to