Hi Erik,
I am really desperate because I cannot clarify the problem to you - and I am really desperate for help now as well.
Don't feel so desperate.... mocking up a simple main() example should be easy. If it is not, then it indicates too much complexity. However, it should be easy to mock up something that isolates the indexing of a single document (maybe only one "subject" field) with "blah blah host defense blah blah" in it, and go from there. I'm afraid we've made things more complex than they need to be, and only a simple example of the situation will help. I simply cannot devote the necessary time into understanding your elaborate description below, sorry.
Please try to create such an example - it will help you understand things better too, not just me. By narrowing things down to very simple examples (look at most of Lucene's test suite to get an idea) as main() or even better, JUnit tests, helps you tinker with design easily and clearly.
Simplicity - it's the only way to true understanding. :)
Erik
Creating a sample application would be possible (and the next step). I call Lucene as web service (could however try to wrap the WS function with a main() and create an application for you to run from the command line).
However please allow me once again to try to explain:
I have lots of small xml files that I want to show only depending on whether their <subject> tag contains certain keywords / keyphrases.
They have been indexed using StandardAnalyser
As search criterion I pass on terms from a domain ontology to see what XML files match these terms within <subject>.
I started using QueryParser: Query query = QueryParser.parse(line, "name", analyzer); where 'line' was simply a whitespace-delimited line of concepts
Worked fine, even could search for keyphrases by linking the words with underscore, e.g. host_defense.
Did produce an error however if the user chooses a very high concept level in the domain ontology resulting in200 terms to be put into the query string.
As you pointed out the limitation was obviously the QueryParser (which I could reproduce) so you suggested to bypass QueryParser by constructing a boolean query using TermQuery.
This worked and could take more than 800 (!) terms without errors (could not test more) but because of using TermQuery I lost the functionality to search for phrases, e.g. 'host defense'.
After your last response the only question that remains to me is the syntax for adding a PhraseQuery on field <subject>. I could not make sense of the sparse description in the apidoc for that.
Why am I using the array myquery[]? Well it's simply the one that passes on the massive amount of query terms to the web service. I though by using a string array I could maintain the aspect of each search term, especially when they represent phrases and not single terms, e.g. myquery[n]="host defense"
I would need something that recognises whether the term in myquery[n] is a single term (then adding to the boolean search with TermQuery as usual) OR whether it is a phrase, then adding with PhraseQuery (for which I do not know the syntax). Maybe the PhraseQuery can also add single terms as well - then I would only need this.
Thanks for your help, Erik
-Holger
___________________________________________________ The ALL NEW CS2000 from CompuServe Better! Faster! More Powerful! 250 FREE hours! Sign-on Now! http://www.compuserve.com/trycsrv/cs2000/webmail/
--------------------------------------------------------------------- To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
--------------------------------------------------------------------- To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]