On Apr 5, 2004, at 2:53 PM, [EMAIL PROTECTED] wrote:
Hi Erik,

I am really desperate because I cannot clarify the
problem to you - and I am really desperate for help now
as well.

Don't feel so desperate.... mocking up a simple main() example should be easy. If it is not, then it indicates too much complexity. However, it should be easy to mock up something that isolates the indexing of a single document (maybe only one "subject" field) with "blah blah host defense blah blah" in it, and go from there. I'm afraid we've made things more complex than they need to be, and only a simple example of the situation will help. I simply cannot devote the necessary time into understanding your elaborate description below, sorry.


Please try to create such an example - it will help you understand things better too, not just me. By narrowing things down to very simple examples (look at most of Lucene's test suite to get an idea) as main() or even better, JUnit tests, helps you tinker with design easily and clearly.

Simplicity - it's the only way to true understanding. :)

Erik


Creating a sample application would be possible (and
the next step). I call Lucene as web service (could
however try to wrap the WS function with a main() and
create an application for you to run from the command
line).

However please allow me once again to try to explain:

I have lots of small xml files that I want to show only
depending on whether their <subject> tag contains
certain keywords / keyphrases.

They have been indexed using StandardAnalyser

As search criterion I pass on terms from a domain
ontology to see what XML files match these terms within
<subject>.

I started using QueryParser:
Query query = QueryParser.parse(line, "name",
analyzer);
where 'line' was simply a whitespace-delimited line of
concepts

Worked fine, even could search for keyphrases by
linking the words with underscore, e.g. host_defense.

Did produce an error however if the user chooses a very
high concept level in the domain ontology resulting in
200 terms to be put into the query string.

As you pointed out the limitation was obviously the QueryParser (which I could reproduce) so you suggested to bypass QueryParser by constructing a boolean query using TermQuery.

This worked and could take more than 800 (!) terms
without errors (could not test more) but because of
using TermQuery I lost the functionality to search for
phrases, e.g. 'host defense'.

After your last response the only question that remains
to me is the syntax for adding a PhraseQuery on field
<subject>. I could not make sense of the sparse
description in the apidoc for that.

Why am I using the array myquery[]? Well it's simply
the one that passes on the massive amount of query
terms to the web service. I though by using a string
array I could maintain the aspect of each search term,
especially when they represent phrases and not single
terms, e.g. myquery[n]="host defense"

I would need something that recognises whether the term
in myquery[n] is a single term (then adding to the
boolean search with TermQuery as usual) OR whether it
is a phrase, then adding with PhraseQuery (for which I
do not know the syntax).
Maybe the PhraseQuery can also add single terms as well
- then I would only need this.

Thanks for your help, Erik

-Holger

___________________________________________________
The ALL NEW CS2000 from CompuServe
 Better!  Faster! More Powerful!
 250 FREE hours! Sign-on Now!
 http://www.compuserve.com/trycsrv/cs2000/webmail/





---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]


---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]



Reply via email to