Fanny, The current implementation allows for searching on:
a.. the entire PCDATA content of an XML document. b.. the PCDATA content within specific elements. c.. processing instructions by name and content. d.. attributes of elements by both name and value. e.. elements/PIs with specific parent element types. f.. elements/PIs at specific child locations within a parent element. g.. elements/PIs with specific ancestor element types. h.. elements/PIs with specifically ordered ancestor element type. The original need we had for XML contextual searching was to find a specific document that contained a particular element with particular content, and in relationships to other element types. Currently, searching for a document based on content of two separate elements with a logical AND relationship is not provided. However, the OR relationship should work just fine. There is a field stored that contains all text content for the document, but that probably isn't enough for what you need. Each lucene document from the same XML document has a 'docid' field. You have two real options: 1. Write a queryparser that inherits from the Lucene one that detects the relationship and performs more than one search, grouping results based on document id. Searching for X and Y would become: 1. Search for X -> Hits_X 2. Search for Y -> Hits_Y 3. Merge Hits_X and Hits_Y based on docid. -=- 2. Write a queryparser that inherits from the lucene one, detects that you are searching for a document based on several elements, as opposed to a single one, and converts the search from: X AND Y to: (X AND docid:docidentifier) OR (Y AND docid:docidentifier) ..and then merge results based on docid. You may also be able to leverage the search 'Filtering' mechanism, but I'm not experienced with that... <<<From FAQ>>> 16. What is filtering and how is it performed ? Filtering means imposing additional restriction on the hit list to eliminate hits that otherwise would be included in the search results. There are two ways to filter hits: a.. Search Query - in this approach, provide your custom filter object to the when you call the search() method. This filter will be called exactly once to evaluate every document that resulted in non zero score. b.. Selective Collection - in this approach you perform the regular search and when you get back the hit list, collect only those that matches your filtering criteria. In this approach, your filter is called only for hits that returned by the search method which may be only a subset of the non zero matches (useful when evaluating your search filter is expensive). <<< ... >>> > 1. What the query string suppose to be if I want to get records which > contain (Austalia and 20020415) or (HongKong and 20020315)? ((Australia +tagname:country) AND (+tagname:date +20020415)) OR ((HongKong +tagname:country) AND (tagname:date +20020415)) > 2. What the query string suppose to be if I want to get records which > contain (Australia and 20020415) and (not (HongKong and 20020315))? ((Australia +tagname:country) AND (+tagname:date +20020415)) AND (( tagname:country HongKong) AND (tagname:date 20020415)) Either of these queries will require the additional functionality outlined in options 1 or 2 above. Regards, -Brandon Brandon Jockman ISOGEN International, LLC. [EMAIL PROTECTED] ----- Original Message ----- From: "Fanny Yeung" <[EMAIL PROTECTED]> To: <[EMAIL PROTECTED]> Sent: Monday, May 13, 2002 7:48 AM Subject: Search on XML files > Hi, > > Does anyone know how to make up the query for multiple fields search on XML > files in the sample provided by isogen? Does it support? > > I would like to get all the results which contain the value of 'Australia' > in tag 'country' AND the date is '20020415' in the tag 'date'. I always get > 0 hit count. Any problem of my query string? > > +(Australia AND tagname:country) AND +(20020415 AND tagname:date) > > 1. What the query string suppose to be if I want to get records which > contain (Austalia and 20020415) or (HongKong and 20020315)? > 2. What the query string suppose to be if I want to get records which > contain (Australia and 20020415) and (not (HongKong and 20020315))? > > Since I am a newbie on Lucene, I am wonder whether I can use filter to > restricts the search results? In my case, I need to retrieve all the news > between a date range (for example, 20020102 to 20020330). In addition, the > result should only contains those news that have been subscribed . Should I > use filter to filter out the unsubscribed news? Or I should make up a query > string to include those subscribed news? Which approach is better in terms > of performance? > > Thanks in advance. > > > Fanny > > _________________________________________________________________ > MSN Photos is the easiest way to share and print your photos: > http://photos.msn.com/support/worldwide.aspx > > > -- > To unsubscribe, e-mail: <mailto:[EMAIL PROTECTED]> > For additional commands, e-mail: <mailto:[EMAIL PROTECTED]> > > -- To unsubscribe, e-mail: <mailto:[EMAIL PROTECTED]> For additional commands, e-mail: <mailto:[EMAIL PROTECTED]>
