Re: JESS: matching on XML

Daniel B. Davis Sat, 31 Oct 1998 15:10:57 -0500
At 06:45 PM 10/28/98 -0800, you wrote:
>I think Dave Carlson wrote:
>> 
>> On October 1st, the W3C voted to accept the XML Document Object Model (DOM)
>> as a standard interface for accessing XML document structures.  It's rather
>> limited, but works pretty well for many applications.  So far, the IBM XML
>> parser seems like the only one to support this final release of DOM, but
>> others will surely follow.  The rules below are written about DOM objects,
>> so should work for any compliant XML parser.
>> 
>
>This is great stuff, Dave; my earlier message which stated 'one
>could...' should be amended to read 'Dave Carlson has...' Jeez, I wish
>I had more time to do neat stuff like this!
>
>I know of two other DOM implementations which support the proposed W3C
>DOM 1.0: Sun has an 'early-access' implementation available for
>download at the JDC (developer.javasoft.com), and an independent
>version called Docuverse (http://www.docuverse.com/domsdk/index.html),
>both of which are pretty good. Sun's parser is very fast.
>
>
>---------------------------------------------------------
>Ernest Friedman-Hill  
>Distributed Systems Research        Phone: (510) 294-2154
>Sandia National Labs                FAX:   (510) 294-2234
>Org. 8920, MS 9214                  [EMAIL PROTECTED]
>PO Box 969                  http://herzberg.ca.sandia.gov
>Livermore, CA 94550
>
>---------------------------------------------------------------------
>To unsubscribe, send the words 'unsubscribe jess-users [EMAIL PROTECTED]'
>in the BODY of a message to [EMAIL PROTECTED], NOT to the
>list. List problems? Notify [EMAIL PROTECTED]
>---------------------------------------------------------------------
>

Dave Carlsen and Ernest Friedman-Hill and any other interested respondents:

First, thank you for your answers and interest.  Dave, I have not worked
through your stuff yet, but I am going to right away.  I have a problem in
that the work I am doing does not connect with the public Internet or Web,
so that during work hours I cannot communicate publicly.

I have already:
        - Obtained to IBM Parser.
        - Hooked it to JESS.
        - Parsed XML incoming documents,
        - Asserted every element with a start-tag and stop-tag as fact
contsining
          the start-tag followed by its text if any, as in:
                f-120: <element> text
          [Thank you, Ernest, the < and > do assert.]

This is workable if the document has a flat, broad structure, such as:
        <body>
        <e1> ... </e1>
        <e2> ... </e2>
       ...
       </body>
As it so happens, the documents with which I am dealing are 
very like that, so that for the near-term success of this try,
we're ok.  But out there in the 1- or 2-year future are other
documents.

I was looking, in my web posting, for consideration of a general 
solution, which would allow matching on documents with deeper 
structure, without having to repeat that structure in the set of asserted
facts by somehow interning and Retifying the document to facilitate 
rapid matching.

An example:
   <body> ... </body>
   <s1>
        <e1> ... </e1>
        <e2> ... </e2>
        ...
        <en> ... </en>
    </s1>
   <s1>
        <e1> ... </e1>
        <e2> ... </e2>
        ...
        <en> ... </en>
    </s1>
   <s2>
        <e1> ... </e1>
        <e2> ... </e2>
        ...
        <en> ... </en>
   </s2>

I could want to match upon the text of those <e2> within { not within }
 <s1>, etc to any depth.  In such cases, just asserting <e1> followed by
its text is not sufficient.  Possibly it could made to work by flattening the 
tag by constructing, then asserting:
<1-s1-e1>...
<1-s1-e2> ...
...
<2-s1-e1>...
<2-s1-e2> ...
...
<3-s2-e1>...
<3-s2-e2> ...
Have to think about it.

The IBM Parser does offer methods that pull out or match on given 
parts of the XML.  It seems to me that the problem is to bridge the 
gap between the Rete representation and the DOM representation.

Now that I've told you more than I know, I'll go off and look at Dave's stuff.

Ernest, this is a central facility to the entire computing community.  If not
at this moment, then over the next few years as the use of XML becomes more
ubiquitous than HTML. I  would bet that if you could solve the problem cleanly
for java 1.2's new HTML objects, the solution would transfer into java x.y after
XML becomes first-class java objects.

Of course, that wont help me or my problem.
(Unless I want to write enough parser to redevelop the HTML set for XML)
I like the IBM parser and, for the time being, I'll go with it.

Open here for all thoughts and suggestions.  Since the success for flat
documents,
there is a bit of time in hand to get a good solution to the more general
problem.
For example, how about asserting the fact that there is a certain specific XML
document, then specialized match operators driven by the specific document fact.
e.g.
                f-nnn: (XML "URL" <DOM-object>)

Thanks again.

---------------------------------------------------------------------
To unsubscribe, send the words 'unsubscribe jess-users [EMAIL PROTECTED]'
in the BODY of a message to [EMAIL PROTECTED], NOT to the
list. List problems? Notify [EMAIL PROTECTED]
---------------------------------------------------------------------
Re: JESS: matching on XML

Reply via email to