Hi Erik, all, I will write a small application (using RAM indexing) soon. At the moment I realised my application using the QueryParser and breaking the search string down in pieces of 100 keywords/keyphrases. This is still fast enough for my application as searches > 100 terms are quite infrequent.
The reason why I abandoned the BooleanQuery plus StandardAnalyzer combination: BooleanQuery is capable to 'digest' more than 800 terms as input from my search expansion. However bringing StandardAnalyzer into the equation ... same problem as with query parser: Produced a 'line too long' error when more than ~ 150 terms are supplied. Thanks for all your support this time, I've learned a lot about Lucene - and after all I have a working application. -Holger On Mon, 5 Apr 2004 15:49:41 -0400, Erik Hatcher wrote: > > On Apr 5, 2004, at 2:53 PM, [EMAIL PROTECTED] wrote: > > Hi Erik, > > > > I am really desperate because I cannot clarify the > > problem to you - and I am really desperate for help > now > > as well. > > Don't feel so desperate.... mocking up a simple main() > example should > be easy. If it is not, then it indicates too much > complexity. > However, it should be easy to mock up something that > isolates the > indexing of a single document (maybe only one "subject" > field) with > "blah blah host defense blah blah" in it, and go from > there. I'm > afraid we've made things more complex than they need to > be, and only a > simple example of the situation will help. I simply > cannot devote the > necessary time into understanding your elaborate > description below, > sorry. > > Please try to create such an example - it will help you > understand > things better too, not just me. By narrowing things > down to very > simple examples (look at most of Lucene's test suite to > get an idea) as > main() or even better, JUnit tests, helps you tinker > with design easily > and clearly. > > Simplicity - it's the only way to true understanding. > :) > > Erik > > > > Creating a sample application would be possible (and > > the next step). I call Lucene as web service (could > > however try to wrap the WS function with a main() and > > create an application for you to run from the command > > line). > > > > However please allow me once again to try to explain: > > > > I have lots of small xml files that I want to show > only > > depending on whether their <subject> tag contains > > certain keywords / keyphrases. > > > > They have been indexed using StandardAnalyser > > > > As search criterion I pass on terms from a domain > > ontology to see what XML files match these terms > within > > <subject>. > > > > I started using QueryParser: > > Query query = QueryParser.parse(line, "name", > > analyzer); > > where 'line' was simply a whitespace-delimited line of > > concepts > > > > Worked fine, even could search for keyphrases by > > linking the words with underscore, e.g. host_defense. > > > > Did produce an error however if the user chooses a > very > > high concept level in the domain ontology resulting in > >> 200 terms to be put into the query string. > > > > As you pointed out the limitation was obviously the > > QueryParser (which I could reproduce) so you suggested > > to bypass QueryParser by constructing a boolean query > > using TermQuery. > > > > This worked and could take more than 800 (!) terms > > without errors (could not test more) but because of > > using TermQuery I lost the functionality to search for > > phrases, e.g. 'host defense'. > > > > After your last response the only question that > remains > > to me is the syntax for adding a PhraseQuery on field > > <subject>. I could not make sense of the sparse > > description in the apidoc for that. > > > > Why am I using the array myquery[]? Well it's simply > > the one that passes on the massive amount of query > > terms to the web service. I though by using a string > > array I could maintain the aspect of each search term, > > especially when they represent phrases and not single > > terms, e.g. myquery[n]="host defense" > > > > I would need something that recognises whether the > term > > in myquery[n] is a single term (then adding to the > > boolean search with TermQuery as usual) OR whether it > > is a phrase, then adding with PhraseQuery (for which I > > do not know the syntax). > > Maybe the PhraseQuery can also add single terms as > well > > - then I would only need this. > > > > Thanks for your help, Erik > > > > -Holger > > > > ___________________________________________________ > > The ALL NEW CS2000 from CompuServe > > Better! Faster! More Powerful! > > 250 FREE hours! Sign-on Now! > > http://www.compuserve.com/trycsrv/cs2000/webmail/ > > > > > > > > > > > > > --------------------------------------------------------------------- > > To unsubscribe, e-mail: > [EMAIL PROTECTED] > > For additional commands, e-mail: > [EMAIL PROTECTED] > > > --------------------------------------------------------------------- > To unsubscribe, e-mail: > [EMAIL PROTECTED] > For additional commands, e-mail: > [EMAIL PROTECTED] ___________________________________________________ The ALL NEW CS2000 from CompuServe Better! Faster! More Powerful! 250 FREE hours! Sign-on Now! http://www.compuserve.com/trycsrv/cs2000/webmail/ --------------------------------------------------------------------- To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]