RE: JESS: Jess performance question

Adair, John Mon, 19 Jun 2006 17:26:39 -0700

Thanks to everyone who answered. Since in my more general case the
matching logic for each rule may contain arbitrary predicate logic,
I was hesitant to go down the road of using facts instead of rules.

However, since everyone said that my problem was the flatness of my
ruleset, I introduced hierarchy to the ruleset by adding an extra
slot to my template which effectively represented a bucketed version
of the original slot, and an extra condition to each rule matching
the bucket first. It seems (although this is based on intuition and
approximate measurements) that the optimal bucket size is the square
root of the number of distinct slot values. I was able to reduce my
173 minutes down to 7.

I thought that adding another layer of bucketing might help even more,
but it seems that the additional overhead in the rule creation and
matching offset any gains.

-----Original Message-----
From: [EMAIL PROTECTED] [mailto:[EMAIL PROTECTED]
On Behalf Of friedman_hill ernest j
Sent: Monday, June 19, 2006 10:23 AM
To: [email protected]
Subject: Re: JESS: Jess performance question

Hi John,

The other folks in this thread have covered some of the same ground in
a more theoretical way; let me try to make the discussion more
explicit. The point I'm trying to illustrate is that Jess is optimized
for cross-correlations between facts, a facility that you *could* use
here, but aren't.

So, to begin. A rule like this:

(defrule rule-1
  (foo (bar 1))
  => )

is compiled to (slightly simplifying) three nodes in the Rete network:
one that allows through only "foo" facts, which feeds into one that
allows through only facts with "1" in the "bar" slot, which feeds into
the node that activates the rule. If you imagine that you have 500,000
rules which each test for the literal values 1...500,000, then that
first node feeds into 500,000 nodes which each test for one literal
value. Every "foo" fact has to be tested against every literal value,
so the overall performance is proportional to the number of facts
times the number of rules.

Now, imagine instead a rule like this:

(defrule rule-2
  (match (data ?x))
  (foo (bar ?x))
  => )

Again, slightly simplifying, but this rule compiles into four nodes:
one that passed "data" facts, one that passes "foo" facts, both of
which feed into one that correlates data facts and foo facts
containing matching data, which then feeds into the rule-activating
node. 

That correlating node is really smart, and the magic that it does is
what makes Jess so fast. The rule compiler understands that the "data"
slots in the "match" facts have to match the "bar" slots in the "foo"
facts, and so the node has some indexes built into it. When a "foo" or
"match" fact is asserted, potential partners are looked up using the
value of "?x" as a key. Because of the way the indices are organized,
each fact has to be compared against only a small number of other
facts.

Compare this to the description of rule-1 above. If there are
1000 "foo" facts, then with 500,000 rules like rule-1, then there are
a total of 5 million tests to activate all the matching rules. Instead
if we have 500,000 "match" facts and 1000 "foo" facts and we're using
rule-2, then (depending on the dynamical history of the application,
hash collisions, etc) we might do as few as 1000 tests -- i.e., the
performance is linear in the number of "foo" facts and independent of
the number of patterns to be matched.

You dismissed Bob's examples as being too simple, and that's most
likely either because (a) they match any possible values on the LHS,
and you want to match only certain values, and/or (b) they derive the
method call arguments directly from the fact data, and the
relationship isn't necessarily so simple. So I'd propose something
like this, which you could generalize to any number of arguments:

(deftemplate MAIN::R
   (slot _r (type OBJECT))
   (slot a (type STRING)))

(deftemplate MAIN::match
   (slot data)
   (slot argument))

(defrule rule-3
   (match (data ?data) (argument ?arg))
   (R (a ?data) (_r ?o)) 
   => 
   (?o a ?arg))

Then instead of generating rules, you want to generate facts. Note
that the match/argument slot could contain a String literal containing
Jess code, which you could execute using the "eval" function on the
rule RHS -- so this really is very general.

I think Adair, John wrote:
> Thanks to you both. Unfortunately, as I tried to emphasize below, the
> apparent
> structure in the values being matched is an artifact of the data
> generation, and
> will not hold in general.
> 
> The case I'm testing, where I have large numbers of rules with an
> exceedingly
> simple structure, is just one special case of a more general problem.
I
> have
> already considered the possibility of solving this use case in a
> different way,
> such as through database or hashtable lookup, and am in the process of
> determining
> whether that is necessary by doing performance testing on Jess and
other
> search
> engines. Given that rules engines work great for most of my other use
> cases, where
> there is a natural branching structure, it would obviously be best for
> me if I
> could use the same tool everywhere.
> 
> Let's say I'm stuck with a ruleset structure similar to the one below.
> Is the best
> matching time I can hope for from Jess linear in the number of rules?
I
> know Jess
> has tuning parameters, such as the node index hash value mentioned in
> Dr. Friedman-Hill's
> book, which can affect performance of different kinds of rules. Are
> there any parameters
> that might help here?
> 
> -----Original Message-----
> From: [EMAIL PROTECTED] [mailto:[EMAIL PROTECTED]
> On Behalf Of Orchard, Bob
> Sent: Friday, June 16, 2006 8:20 PM
> To: [email protected]
> Subject: RE: JESS: Jess performance question
> 
> As Dave says it's not clear what you are trying to do but one rule
could
> do all
> of this and I suspect there would be performance gains ...
> 
>  (defrule oneRule 
>   (R (a ?a) (_r ?o)) 
>  => 
>   (?o a (integer (sub-string 2 (str-length ?a) ?a)))
>  )
> 
> 
> Simpler if slot 'a' contains an integer, then 
> 
>  (defrule oneRule 
>   (R (a ?a) (_r ?o)) 
>  => 
>   (?o a ?a)
>  )
> 
> BUT with assumption that you really have strings that are always
> "annnnn" in slot a ... with 'a' as the first character of the string
and
> then
> the rest of the characters are digits.
> 
> Bob Orchard
> 
> 
> 
> -----Original Message-----
> From: [EMAIL PROTECTED] [mailto:[EMAIL PROTECTED]
> On Behalf Of [EMAIL PROTECTED]
> Sent: Friday, June 16, 2006 5:03 PM
> To: [email protected]
> Subject: Re: JESS: Jess performance question
> 
> 
> Hello,
> 
> It is unclear what your intentions are, but it may be that you are
> simply using the wrong tool for the task.  Another possibility is that
> you have inverted the use of facts and rules with respect to your
> problem.  (I.e., usually facts are used to represent data, rather than
> rules.)
> 
> To understand why you're seeing such poor performance, you should
study
> how the Rete algorithm works.  It is the basis for efficient pattern
> matching in JESS.
> 
> Loosely speaking, a large ruleset will perform better if the patterns
> can grow the Rete network into a heavily branching tree.  In your
case,
> the automated ruleset generation appears to be tending a plush lawn of
> grass. The result is that every fact asserted is being compared to
each
> and every blade!  You'll need to partition your ruleset by adding
> additional patterns so that facts are quickly matched against entire
> branches at once.  Look on the web site for tips on writing and
ordering
> rule patterns.
> 
> Hope this helps,
> 
> Dave Barnett
> 
> On Fri, 16 Jun 2006, Adair, John wrote:
> 
> > I have a ruleset that looks like this:
> > (deftemplate MAIN::R
> >    (slot _r (type OBJECT))
> >    (slot a (type STRING)))
> > (defrule rule-0 (R (a "a0") (_r ?o)) => (?o a 0))
> > ..
> > (defrule rule-499999 (R (a "a499999") (_r ?o)) => (?o a 499999))
> >
> > The fact that the numeric part of the rule name and the int argument

> > in the method call are the same is not a coincidence, but the 
> > particular value of the "a" slot that I'm matching on is because of 
> > the data I generated the ruleset from, not because the rules will 
> > always look like this.
> >
> > I seem to be spending a lot of time asserting facts, the main time 
> > cost of which I assume to be in pattern matching. When I have 5,000 
> > rules, it takes about 32 seconds to assert 100,000 facts. When I
have 
> > 20,000 rules, it takes about 160 seconds. This worse-than-linear 
> > degradation continues as I add rules approaching my target of
500,000 
> > rules (which I haven't been able to run successfully yet).
> >
> > Is there anything I can do to my ruleset to improve performance?
> >
----------------------------------------------------------------------
> > --
> > <John S. Adair><[EMAIL PROTECTED]><It's turtles all the way
> down.>
> >
>
------------------------------------------------------------------------
> >
> >
> > --------------------------------------------------------------------
> > To unsubscribe, send the words 'unsubscribe jess-users 
> > [EMAIL PROTECTED]' in the BODY of a message to [EMAIL PROTECTED],
NOT
> 
> > to the list (use your own address!) List problems? Notify 
> > [EMAIL PROTECTED]
> > --------------------------------------------------------------------
> >
> 
> --------------------------------------------------------------------
> To unsubscribe, send the words 'unsubscribe jess-users
[EMAIL PROTECTED]'
> in the BODY of a message to [EMAIL PROTECTED], NOT to the list (use
> your own address!) List problems? Notify [EMAIL PROTECTED]
> --------------------------------------------------------------------
> 
> 
> --------------------------------------------------------------------
> To unsubscribe, send the words 'unsubscribe jess-users
[EMAIL PROTECTED]'
> in the BODY of a message to [EMAIL PROTECTED], NOT to the list
> (use your own address!) List problems? Notify
> [EMAIL PROTECTED]
> --------------------------------------------------------------------
> --------------------------------------------------------------------
> To unsubscribe, send the words 'unsubscribe jess-users
[EMAIL PROTECTED]'
> in the BODY of a message to [EMAIL PROTECTED], NOT to the list
> (use your own address!) List problems? Notify
[EMAIL PROTECTED]
> --------------------------------------------------------------------
> 

---------------------------------------------------------
Ernest Friedman-Hill  
Advanced Software Research          Phone: (925) 294-2154
Sandia National Labs                FAX:   (925) 294-2234
PO Box 969, MS 9012                 [EMAIL PROTECTED]
Livermore, CA 94550         http://herzberg.ca.sandia.gov
--------------------------------------------------------------------
To unsubscribe, send the words 'unsubscribe jess-users [EMAIL PROTECTED]'
in the BODY of a message to [EMAIL PROTECTED], NOT to the list
(use your own address!) List problems? Notify
[EMAIL PROTECTED]
--------------------------------------------------------------------
--------------------------------------------------------------------
To unsubscribe, send the words 'unsubscribe jess-users [EMAIL PROTECTED]'
in the BODY of a message to [EMAIL PROTECTED], NOT to the list
(use your own address!) List problems? Notify [EMAIL PROTECTED]
--------------------------------------------------------------------

RE: JESS: Jess performance question

Reply via email to