Goulish, Michael writes:
> 
> To really preserve the relationships in arbitrarily 
> structured XML, you pretty much need to use a database 
> that directly supports an XML query language like 
> XQuery or XPath.
> 
If searching within regions is enough (something e.g. sgrep 
(http://www.cs.helsinki.fi/u/jjaakkol/sgrep.html) or OpenText/PAT does),
I think this can be done on top of lucene.

Basically you need to index region start and region end markers.
In order to search a term within a region, you can use TermPositions
to loop over all matches of the term and all start and end markers of
the region to check where you find a match within this region.

Of course search logic for region search is quite different to lucenes
document queries.
There are two types of results (match points and regions) and the
basic operations include match points/region in region, region containing
match points/region, joins and intersection of match points or regions.
I don't know if and how this could be integrated with lucenes normal
queries. But of course one could get a list of matching documents from
results of region searches.
If you (ab)use lucenes token position to store the character position
of the token, you could also extract the regions text from a stored copy.

I'm currently doing some experiments with such kind of queries using lucene
and find it performs quite well.

You won't be able to distinguish between parents and other ancestors 
though and there won't be any support for searching siblings.

Morus

---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

Reply via email to