Hi,

I'm trying to develop a class to handle an XML document, where
the contents aren't so much indexed on a per-document basis,
rather on an element basis. Each element has a unique ID, so
I'm looking to create a class/method similar to Lucene's
Document.Document(). By way of example, I'll use some XHTML
markup to illustrate what I'm trying to do:

  <html>
   <base href="http://purl.org/ceryle/blat.xml"/>
   [...]
   <body>
     <p id="p1">
        some text to index...
     </p>
     <p id="p2">
        some more text to index...
     </p>
     <p id="p3">
        even more text to index...
     </p>
   </body>
  </html>

I'd very much appreciate any help in explaining how I'd go about
creating a method to return a Lucene Document to index this via
ID. Would I want a separate Document per <p>? (There are many
thousands of such elements.) Everything in my system, both at the
document and the individual element level is done via URL, so
the method should create URLs for each <p> element like

   http://purl.org/ceryle/blat.xml#p1
   http://purl.org/ceryle/blat.xml#p2
   http://purl.org/ceryle/blat.xml#p3
   etc.

I don't need anyone to go to the trouble of coding this, just point
me to how it might be done, or to any existing examples that do this
kind of thing.

Thanks very much!

Murray

......................................................................
Murray Altheim                    http://kmi.open.ac.uk/people/murray/
Knowledge Media Institute
The Open University, Milton Keynes, Bucks, MK7 6AA, UK               .

  "If we can just get the people that can reconcile themselves
   to the new dispensation out of the way and then kill the few
   thousand people who can't reconcile themselves, then we can
   let the remaining 98 percent come back and live out their
   lives," Pike said. "If we bomb the place to the ground, those
   peace-loving people won't have a home to live in. [...] If we
   simply pulverize the city, it would look bad on TV." -- John Pike

  U.S., Iraqi troops mass for assault on Fallujah
  STRATEGY: U.S. to employ snipers, robots to cut down casualties
    Matthew B. Stannard, San Francisco Chronicle
  http://www.sfgate.com/cgi-bin/article.cgi?file=/c/a/2004/11/06/MNGHL9NBU11.DTL

  "We have a growing, maturing insurgency group. We see larger
   and more coordinated military attacks. They are getting better
   and they can self-regenerate. The idea there are x number of
   insurgents, and that when they're all dead we can get out is
   wrong. The insurgency has shown an ability to regenerate itself
   because there are people willing to fill the ranks of those who
   are killed. The political culture is more hostile to the US
   presence. The longer we stay, the more they are confirmed in
   that view." -- W Andrew Terrill

  Far Graver Than Vietnam, Sidney Blumenthal, The Guardian
  http://www.guardian.co.uk/comment/story/0,,1305360,00.html

---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]



Reply via email to