Hi I am using lucene based index for solving following problem -
1. I have a doc with following structure: docName:<something>> includeKeywords: Space separated set of keywords. excludeKeywords: Space separated set of keywords. result: to be returned as response ... .. 2. Now I will receive a set of keywords in request and I have to find all the documents whose all include keywords (space separated in document) are in the request and none of the exclude keywords (space separated in document) are in the request. Example: Doc: docName:"xyz" includeKeywords:["ABC" , "AB" , "XYZ" , "Z"] excludeKeywords:["KL"] Requests that will match the doc: 1. [ "ABC", "AB", "XYZ", "Z", "OP", "QR" ] 2. [ "ABC", "AB", "XYZ", "Z"] Requests that will not match the doc: 1. [ "ABC", "AB", "XYZ"] 2. [ "ABC", "AB", "XYZ", "Z", "KL"] 3. [ "ABC", "AB", "XYZ", "Z", "OP", "QR", "KL" ] I have used Whitespace analyzer for creating indexes as it will properly tokenize my space separated keywords in include and exclude keywords list. Also, in search I am using Boolean query by combining MatchAllDocs query and other boolean query's with MUST_NOT_OCCUR clause. Now with 120000 such documents indexed in lucene, I am getting a response time of 7 ms on a machine with following configuration - Architecture: x86_64 CPU op-mode(s): 32-bit, 64-bit CPU(s): 8 On-line CPU(s) list: 0-7 Thread(s) per core: 1 Core(s) per socket: 1 Socket(s): 8 CPU MHz: 2294.686 BogoMIPS: 4589.37 Virtualization type: full L2 cache: 256K L3 cache: 25600K I need help to optimize this situation, as I feel that 7ms for such simple use case is too much. Is there a way I can optimize lucene search for this case, and get down my response time ? Please help me here. Also, please let me know if there is a confusion in understanding of the problem. I will re explain it with more examples. Thanks in advance. Regards -Apurv