Hi,
On 09/09/13 23:25, [email protected] wrote:
Hi,
I'm considering the Jena Rules as a rule-based programming model
where rules are being discovered and accumulated to grow tens of
thousand, while the fact for inferring new info is only a few
RDF statements. In this case, the rule engine may have to check
each and every rule for the fact to find out the one matching
the statements - which may imply a scaling issue.
Or, should the rules be organized into a set of category, and
the statement is classified first to select the matching rule
set to reduce the rule processing time ?
Will appreciate your insights,
In theory the primary scaling issue in this case should be the number of
distinct patterns in the rules rather than the number of rules. In RETE
the rules are implemented as a pattern matching network and facts are
dropped in.
However, in practice the Jena rules implementation is crude and hasn't
been designed or tested on huge numbers of rules. So the network it
produces may be suboptimal (especially if grown incrementally) and there
is no indexing in the cases where one node fans out to a very large
number of child nodes. Given the simplicity of the Jena implementation
then at least putting the more discriminating patterns at the start of
the rules is likely to help.
The only way to check if Jena could cope with this would be to run some
representative tests.
Dave