Hi Dave,

Thanks for your valuable insights helping me how to structure
the rule internals to improve the scalability. I'll do some
research on my rule generator to implement that. The scaling
issue is to execute the rule engine in realtime to catch up
with real world message streams like tweets.

Best

Chan

Hi,

On 09/09/13 23:25, [email protected] wrote:
Hi,

I'm considering the Jena Rules as a rule-based programming model
where rules are being discovered and accumulated to grow tens of
thousand, while the fact for inferring new info is only a few
RDF statements. In this case, the rule engine may have to check
each and every rule for the fact to find out the one matching
the statements - which may imply a scaling issue.

Or, should the rules be organized into a set of category, and
the statement is classified first to select the matching rule
set to reduce the rule processing time ?

Will appreciate your insights,

In theory the primary scaling issue in this case should be the number
of distinct patterns in the rules rather than the number of rules. In
RETE the rules are implemented as a pattern matching network and facts
are dropped in.

However, in practice the Jena rules implementation is crude and
hasn't been designed or tested on huge numbers of rules. So the
network it produces may be suboptimal (especially if grown
incrementally) and there is no indexing in the cases where one node
fans out to a very large number of child nodes. Given the simplicity
of the Jena implementation then at least putting the more
discriminating patterns at the start of the rules is likely to help.

The only way to check if Jena could cope with this would be to run
some representative tests.

Dave

Reply via email to