You can have fields which correspond to each of the element names in your documents. If you index all of the terms which occur within each field, via a custom Document generator and a custom stream/file parser for your XML documents, you could use the A:foo B:foo C:foo syntax in queries using the default query parser. Come to think of it, given appropriate field naming, I think you can use the A/B:foo A/B/C:foo syntax, too: just add fields named "A/B", "A/B/C", etc. (BTW, using the "A:B:C:foo" syntax is just asking for trouble, unless you want to write your own query parser, since the ':' character is used to separate field names from query terms).
For example, the document parser could return a list of element type names to the document generator, and a concatenation of the text nodes occurring within each element type (you'd have to think about multiple occurrences of element types in a single document, of course). Your document generator class would then just add a tokenized field (I'm assuming you want tokenization here) for each element type name (or XPath-ish expression) and give the flattened representation for it as the value parameter to the Field constructor.
To use your XML document example, your Document generator would pass the document to your parser, which could report a series of fields (I'll use the XPath-ish syntax here): "A", containing "foo", "A/B", containing "foo", and "A/B/C", containing "foo". (It might make more sense in actual practice to reverse the reporting order from the parser, but you get the idea.)
Hope this helps,
Steve Rowe
Harry Foxwell wrote:
in an xml document, say <A> <B> <C>foo</C> </B> </A>Lucene syntax allows me to search for term 'C:foo' Is there a way to specify query for 'foo' in one of the parents of C? Something like B:foo, B:C:foo, B/C:foo, or something like that idea (but of course these don't work).
-- To unsubscribe, e-mail: <mailto:[EMAIL PROTECTED]> For additional commands, e-mail: <mailto:[EMAIL PROTECTED]>
