Goulish, Michael writes: > > To really preserve the relationships in arbitrarily > structured XML, you pretty much need to use a database > that directly supports an XML query language like > XQuery or XPath. > If searching within regions is enough (something e.g. sgrep (http://www.cs.helsinki.fi/u/jjaakkol/sgrep.html) or OpenText/PAT does), I think this can be done on top of lucene.
Basically you need to index region start and region end markers. In order to search a term within a region, you can use TermPositions to loop over all matches of the term and all start and end markers of the region to check where you find a match within this region. Of course search logic for region search is quite different to lucenes document queries. There are two types of results (match points and regions) and the basic operations include match points/region in region, region containing match points/region, joins and intersection of match points or regions. I don't know if and how this could be integrated with lucenes normal queries. But of course one could get a list of matching documents from results of region searches. If you (ab)use lucenes token position to store the character position of the token, you could also extract the regions text from a stored copy. I'm currently doing some experiments with such kind of queries using lucene and find it performs quite well. You won't be able to distinguish between parents and other ancestors though and there won't be any support for searching siblings. Morus --------------------------------------------------------------------- To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
