Thanks to everyone who answered. Since in my more general case the matching logic for each rule may contain arbitrary predicate logic, I was hesitant to go down the road of using facts instead of rules.
However, since everyone said that my problem was the flatness of my ruleset, I introduced hierarchy to the ruleset by adding an extra slot to my template which effectively represented a bucketed version of the original slot, and an extra condition to each rule matching the bucket first. It seems (although this is based on intuition and approximate measurements) that the optimal bucket size is the square root of the number of distinct slot values. I was able to reduce my 173 minutes down to 7. I thought that adding another layer of bucketing might help even more, but it seems that the additional overhead in the rule creation and matching offset any gains. -----Original Message----- From: [EMAIL PROTECTED] [mailto:[EMAIL PROTECTED] On Behalf Of friedman_hill ernest j Sent: Monday, June 19, 2006 10:23 AM To: [email protected] Subject: Re: JESS: Jess performance question Hi John, The other folks in this thread have covered some of the same ground in a more theoretical way; let me try to make the discussion more explicit. The point I'm trying to illustrate is that Jess is optimized for cross-correlations between facts, a facility that you *could* use here, but aren't. So, to begin. A rule like this: (defrule rule-1 (foo (bar 1)) => ) is compiled to (slightly simplifying) three nodes in the Rete network: one that allows through only "foo" facts, which feeds into one that allows through only facts with "1" in the "bar" slot, which feeds into the node that activates the rule. If you imagine that you have 500,000 rules which each test for the literal values 1...500,000, then that first node feeds into 500,000 nodes which each test for one literal value. Every "foo" fact has to be tested against every literal value, so the overall performance is proportional to the number of facts times the number of rules. Now, imagine instead a rule like this: (defrule rule-2 (match (data ?x)) (foo (bar ?x)) => ) Again, slightly simplifying, but this rule compiles into four nodes: one that passed "data" facts, one that passes "foo" facts, both of which feed into one that correlates data facts and foo facts containing matching data, which then feeds into the rule-activating node. That correlating node is really smart, and the magic that it does is what makes Jess so fast. The rule compiler understands that the "data" slots in the "match" facts have to match the "bar" slots in the "foo" facts, and so the node has some indexes built into it. When a "foo" or "match" fact is asserted, potential partners are looked up using the value of "?x" as a key. Because of the way the indices are organized, each fact has to be compared against only a small number of other facts. Compare this to the description of rule-1 above. If there are 1000 "foo" facts, then with 500,000 rules like rule-1, then there are a total of 5 million tests to activate all the matching rules. Instead if we have 500,000 "match" facts and 1000 "foo" facts and we're using rule-2, then (depending on the dynamical history of the application, hash collisions, etc) we might do as few as 1000 tests -- i.e., the performance is linear in the number of "foo" facts and independent of the number of patterns to be matched. You dismissed Bob's examples as being too simple, and that's most likely either because (a) they match any possible values on the LHS, and you want to match only certain values, and/or (b) they derive the method call arguments directly from the fact data, and the relationship isn't necessarily so simple. So I'd propose something like this, which you could generalize to any number of arguments: (deftemplate MAIN::R (slot _r (type OBJECT)) (slot a (type STRING))) (deftemplate MAIN::match (slot data) (slot argument)) (defrule rule-3 (match (data ?data) (argument ?arg)) (R (a ?data) (_r ?o)) => (?o a ?arg)) Then instead of generating rules, you want to generate facts. Note that the match/argument slot could contain a String literal containing Jess code, which you could execute using the "eval" function on the rule RHS -- so this really is very general. I think Adair, John wrote: > Thanks to you both. Unfortunately, as I tried to emphasize below, the > apparent > structure in the values being matched is an artifact of the data > generation, and > will not hold in general. > > The case I'm testing, where I have large numbers of rules with an > exceedingly > simple structure, is just one special case of a more general problem. I > have > already considered the possibility of solving this use case in a > different way, > such as through database or hashtable lookup, and am in the process of > determining > whether that is necessary by doing performance testing on Jess and other > search > engines. Given that rules engines work great for most of my other use > cases, where > there is a natural branching structure, it would obviously be best for > me if I > could use the same tool everywhere. > > Let's say I'm stuck with a ruleset structure similar to the one below. > Is the best > matching time I can hope for from Jess linear in the number of rules? I > know Jess > has tuning parameters, such as the node index hash value mentioned in > Dr. Friedman-Hill's > book, which can affect performance of different kinds of rules. Are > there any parameters > that might help here? > > -----Original Message----- > From: [EMAIL PROTECTED] [mailto:[EMAIL PROTECTED] > On Behalf Of Orchard, Bob > Sent: Friday, June 16, 2006 8:20 PM > To: [email protected] > Subject: RE: JESS: Jess performance question > > As Dave says it's not clear what you are trying to do but one rule could > do all > of this and I suspect there would be performance gains ... > > (defrule oneRule > (R (a ?a) (_r ?o)) > => > (?o a (integer (sub-string 2 (str-length ?a) ?a))) > ) > > > Simpler if slot 'a' contains an integer, then > > (defrule oneRule > (R (a ?a) (_r ?o)) > => > (?o a ?a) > ) > > BUT with assumption that you really have strings that are always > "annnnn" in slot a ... with 'a' as the first character of the string and > then > the rest of the characters are digits. > > Bob Orchard > > > > -----Original Message----- > From: [EMAIL PROTECTED] [mailto:[EMAIL PROTECTED] > On Behalf Of [EMAIL PROTECTED] > Sent: Friday, June 16, 2006 5:03 PM > To: [email protected] > Subject: Re: JESS: Jess performance question > > > Hello, > > It is unclear what your intentions are, but it may be that you are > simply using the wrong tool for the task. Another possibility is that > you have inverted the use of facts and rules with respect to your > problem. (I.e., usually facts are used to represent data, rather than > rules.) > > To understand why you're seeing such poor performance, you should study > how the Rete algorithm works. It is the basis for efficient pattern > matching in JESS. > > Loosely speaking, a large ruleset will perform better if the patterns > can grow the Rete network into a heavily branching tree. In your case, > the automated ruleset generation appears to be tending a plush lawn of > grass. The result is that every fact asserted is being compared to each > and every blade! You'll need to partition your ruleset by adding > additional patterns so that facts are quickly matched against entire > branches at once. Look on the web site for tips on writing and ordering > rule patterns. > > Hope this helps, > > Dave Barnett > > On Fri, 16 Jun 2006, Adair, John wrote: > > > I have a ruleset that looks like this: > > (deftemplate MAIN::R > > (slot _r (type OBJECT)) > > (slot a (type STRING))) > > (defrule rule-0 (R (a "a0") (_r ?o)) => (?o a 0)) > > .. > > (defrule rule-499999 (R (a "a499999") (_r ?o)) => (?o a 499999)) > > > > The fact that the numeric part of the rule name and the int argument > > in the method call are the same is not a coincidence, but the > > particular value of the "a" slot that I'm matching on is because of > > the data I generated the ruleset from, not because the rules will > > always look like this. > > > > I seem to be spending a lot of time asserting facts, the main time > > cost of which I assume to be in pattern matching. When I have 5,000 > > rules, it takes about 32 seconds to assert 100,000 facts. When I have > > 20,000 rules, it takes about 160 seconds. This worse-than-linear > > degradation continues as I add rules approaching my target of 500,000 > > rules (which I haven't been able to run successfully yet). > > > > Is there anything I can do to my ruleset to improve performance? > > ---------------------------------------------------------------------- > > -- > > <John S. Adair><[EMAIL PROTECTED]><It's turtles all the way > down.> > > > ------------------------------------------------------------------------ > > > > > > -------------------------------------------------------------------- > > To unsubscribe, send the words 'unsubscribe jess-users > > [EMAIL PROTECTED]' in the BODY of a message to [EMAIL PROTECTED], NOT > > > to the list (use your own address!) List problems? Notify > > [EMAIL PROTECTED] > > -------------------------------------------------------------------- > > > > -------------------------------------------------------------------- > To unsubscribe, send the words 'unsubscribe jess-users [EMAIL PROTECTED]' > in the BODY of a message to [EMAIL PROTECTED], NOT to the list (use > your own address!) List problems? Notify [EMAIL PROTECTED] > -------------------------------------------------------------------- > > > -------------------------------------------------------------------- > To unsubscribe, send the words 'unsubscribe jess-users [EMAIL PROTECTED]' > in the BODY of a message to [EMAIL PROTECTED], NOT to the list > (use your own address!) List problems? Notify > [EMAIL PROTECTED] > -------------------------------------------------------------------- > -------------------------------------------------------------------- > To unsubscribe, send the words 'unsubscribe jess-users [EMAIL PROTECTED]' > in the BODY of a message to [EMAIL PROTECTED], NOT to the list > (use your own address!) List problems? Notify [EMAIL PROTECTED] > -------------------------------------------------------------------- > --------------------------------------------------------- Ernest Friedman-Hill Advanced Software Research Phone: (925) 294-2154 Sandia National Labs FAX: (925) 294-2234 PO Box 969, MS 9012 [EMAIL PROTECTED] Livermore, CA 94550 http://herzberg.ca.sandia.gov -------------------------------------------------------------------- To unsubscribe, send the words 'unsubscribe jess-users [EMAIL PROTECTED]' in the BODY of a message to [EMAIL PROTECTED], NOT to the list (use your own address!) List problems? Notify [EMAIL PROTECTED] -------------------------------------------------------------------- -------------------------------------------------------------------- To unsubscribe, send the words 'unsubscribe jess-users [EMAIL PROTECTED]' in the BODY of a message to [EMAIL PROTECTED], NOT to the list (use your own address!) List problems? Notify [EMAIL PROTECTED] --------------------------------------------------------------------
