It think option B cannot work because due to the MUST operator it requires
both "databasemanagement" and "accountmanagement" to be in the subtype
field.

Option A however should work, once the padding blank spaces are removed
from the field name - notice that while the standard analyzer would trim
spaces from the processed query text, the field names provided remain
unchanged - in this case - most probably - with the spaces.

Additional comment - that I'm not sure that is relevant to your case - on
the solution to this problem:
In this case you had two paths:
    /a/b
    /a/c
And you managed (or would soon manage:-) to ask for a document in either
two paths by asking for "a" in first part and "b" or "c" in second part.
However if the "taxonomy" becomes more complex this may turn more tricky.
For instance if the scenario would have the following possible paths:
   /a/b/c/d/e
   /a/b/c/x/y/z
   /a/b/d/x/f
etc., and assume you want all docs under the sub-tree defined by /a/b/c.
One possibility would be to index for each document all path prefixes -
i.e. for document in /a/b/c/d/e add path tokens (un-tokenized) -     / ,
/a/ . /a/b/ , /a/b/c/d/ , /a/b/c/d/e/ , /a/b/c/d/e   (the latter token
would allow to search also "exact node" matches, i.e. not sub-tree
matches.) I believe you can find useful discussions on this by searching in
the user mailing list for "path" or "hierarchy", and for sure there are
other approaches.

"Pramodh Shenoy" <[EMAIL PROTECTED]> wrote on 12/09/2006 10:54:13:

> The spaces just came i guess when i copied the code to outlook :-),
> actually there arent any. Let me take a look at Luke , especially
> testing to see what should be returned when i run the aprsed query..
> sounds very interesting..
>
> Thanks a lot
> Pramodh
>
> ________________________________
>
> From: Erick Erickson [mailto:[EMAIL PROTECTED]
> Sent: Tue 9/12/2006 11:19 PM
> To: java-user@lucene.apache.org
> Subject: Re: group field selection of the form field:(a b c)
>
>
>
> Interestingly, you have extra spaces when you construct your queries,
e.g.
> queries[2]= " accountmanagement" has an extra space at the beginning but
> when you index the document, there are no spaces. I believe that since
> you're indexing the fields UN_TOKENIZED, that the spaces are preserved in
> the query (but I'm not entirely clear on this point, so don't take my
word
> for it completely <G>).
>
> Have you used Luke to examine your index? You can also put parsed form of
> the query into Luke and play around with that to see what *should* work.
> Google lucene luke and you'll find it right away.
>
> Best
> Erick
>
> On 9/12/06, Pramodh Shenoy <[EMAIL PROTECTED]> wrote:
> >
> > Hi Eric/Usergroup,
> >
> >     I am working on a help content index-search project based on
Lucene.
> > One of my requirements is to search for a particular text in the
content
> > of files from specific directories. When I index the content
> >
> >
> >
> > Eg. guides/accountmanagement/index.htm and
> > guides/databasemanagement/index.htm
> >
> >
> >
> > doc.add(new Field("booktype", "guides", Field.Store.YES,
> > Field.Index.UN_TOKENIZED))
> >
> > doc.add(new Field("subtype", "accountmanagement", Field.Store.YES,
> > Field.Index.UN_TOKENIZED))
> >
> > doc.add(new Field("subtype", "databasemanagement", Field.Store.YES,
> > Field.Index.UN_TOKENIZED))
> >
> > doc.add(new Field("content",
> > all-content-read-from-html-body-as-a-string, Field.Store.NO,
> > Field.Index.TOKENIZED))
> >
> >
> >
> > Now I want to search for all occurrences of "management" in the
> > "content" field (which already exists in both the above index.htm files
> > body), in files under subtype/accountmanagement and under subtype/
> > databasemanagement
> >
> >
> >
> > Iam creating the query as below:
> >
> >
> >
> >         String [] queries = new String [3];// =new String[4]
> >
> >         String [] fields = new String [3] ];// =new String[4]
> >
> >         BooleanClause.Occur[] flags = new BooleanClause.Occur[3] ];//
> > =new String[4]
> >
> >
> >
> >         queries[0]= " guides ";
> >
> >         fields[0]=" booktype ";
> >
> >         flags[0] = BooleanClause.Occur.MUST;
> >
> >
> >
> >         queries[1]= " management ";
> >
> >         fields[1]="content";
> >
> >         flags[1] = BooleanClause.Occur.MUST;
> >
> >
> >
> >         /* ######## A ####### */
> >
> >         queries[2]= " accountmanagement databasemanagement ";
> >
> >         fields[2]=" subtype ";
> >
> >         flags[2] = BooleanClause.Occur.MUST;
> >
> >
> >
> >         /* ##### B #######
> >
> >         queries[2]= " accountmanagement";
> >
> >         fields[2]="subtype";
> >
> >         flags[2] = BooleanClause.Occur.MUST;
> >
> >
> >
> >         queries[3]= " databasemanagement ";
> >
> >         fields[3]=" subtype ";
> >
> >         flags[3] = BooleanClause.Occur.MUST;
> >
> >         */
> >
> >
> >
> >         Query queryObj = null;
> >
> >         //parse the query string
> >
> >         try {
> >
> >             queryObj = MultiFieldQueryParser.parse(queries, fields,
> > flags, new StandardAnalyzer());
> >
> >         } catch (ParseException exp) { }
> >
> >
> >
> >
> >
> > With option A , the query generated looks like:
> >
> > +booktype:guides +content:management +(subtype: accountmanagement
> > subtype: databasemanagement)
> >
> >
> >
> > With option B , the query generated looks like:
> >
> > +booktype:guides +content:management +subtype: accountmanagement
> > +subtype: databasemanagement
> >
> >
> >
> >
> >
> > Both return no Hits.!
> >
> >
> >
> > Any idea how I should create the query. In Lucene In Action, this is
> > explained as "you can group field selection over several terms using
> > field:(a b c)". How can I achieve this with the code above ?
> >
> >
> >
> > Thanks
> >
> > Pramodh
> >
> >
> >
> >
> >
> >
> >
> >
> >
> >
> >
>
>


---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

Reply via email to