Hello, I have a Lucene.NET index created with version 2.9.4.1. The size of the index is about 25 Million entries (In the production environment I will have 50 Million or more). The Index size is 5.75GB. The index is used for searching by text. I need to add a new functionality that allows performing a query for a specific date range in addition to the textual search (The query is for text AND date range). The date range the user can select from is either last 7 days or last 30 days.
The implementation I tried was to add a new indexed only numeric field representing a date. The date is indexed as integer in the format yyyyMMdd. I am indexing this field with a precision step of 1 (to make the retrieval the fastest). During retrieval I create a Boolean query that has the original query and I added a clause for with MUST for the date range. When I compare the results to regular textual queries I see much slower results. I compared by running 10 queries for warm-up (I don't count the results). Then another 90 queries where I count the results. I will appreciate suggestion and tips on how to the performance of searching by dates can be improved. You can see below the statistics for the runs, and the code for creating the fields and the query. Thanks, Avi No changes (using index with no dates) 08 18:17:01,213 [1] INFO: {(null)} - Min search time: 2 08 18:17:01,213 [1] INFO: {(null)} - Max search time: 88 08 18:17:01,213 [1] INFO: {(null)} - Average search time: 23.0674157303371 08 18:17:01,213 [1] INFO: {(null)} - Search time Variance : 20.5 08 18:17:01,213 [1] INFO: {(null)} - Number of results above 700ms: 0 Index With Date (not using dates in query) 08 18:22:49,093 [1] INFO: {(null)} - Min search time: 3 08 18:22:49,093 [1] INFO: {(null)} - Max search time: 176 08 18:22:49,093 [1] INFO: {(null)} - Average search time: 50.9325842696629 08 18:22:49,093 [1] INFO: {(null)} - Search time Variance : 46.85 08 18:22:49,093 [1] INFO: {(null)} - Number of results above 700ms: 0 With Dates - Last 7 Days 08 19:38:17,988 [1] INFO: {(null)} - Min search time: 33 08 19:38:17,988 [1] INFO: {(null)} - Max search time: 1668 08 19:38:17,988 [1] INFO: {(null)} - Average search time: 704.741573033708 08 19:38:17,988 [1] INFO: {(null)} - Search time Variance : 607.05 08 19:38:17,988 [1] INFO: {(null)} - Number of results above 700ms: 44 With Dates - Last 30 Days 08 19:48:17,123 [1] INFO: {(null)} - Min search time: 105 08 19:48:17,123 [1] INFO: {(null)} - Max search time: 4808 08 19:48:17,123 [1] INFO: {(null)} - Average search time: 2846.75280898876 08 19:48:17,123 [1] INFO: {(null)} - Search time Variance : 1934.11 08 19:48:17,123 [1] INFO: {(null)} - Number of results above 700ms: 72 Here are the field's definitions: var idField = new Field( "ID", String.Empty, Field.Store.YES, Field.Index.NOT_ANALYZED_NO_NORMS ); document.Add( idField ); var id2Field = new Field( "ID2", String.Empty, Field.Store.YES, Field.Index.NO ); document.Add( id2Field ); var txtField = new Field( "txtField", String.Empty, Field.Store.NO, Field.Index. ANALYZED ); document.Add( txtField ); var txt2Field = new Field( "txt2Field", String.Empty, Field.Store.NO, Field.Index. ANALYZED ); document.Add( txt2Field ); var txt3Field = new Field( "txt3Field", String.Empty, Field.Store.NO, Field.Index. ANALYZED ); document.Add( txt3Field ); // The new date field var dateField = new NumericField( "Date", 1, Field.Store.NO, true ); document.Add(dateField); I set the values to the fields. For the new date field I set it like this: Int64 dateInt = <some date>; dateField.SetIntValue(dateInt); The query: var fields = new String[3]; Dictionary<String, Single> boosts = new Dictionary<String, Single>(); fields[0]="txtField"; boosts.Add( fields[0],<Value>); fields[1]="txt2Field"; boosts.Add( fields[1],<Value>); fields[2]="txt3Field"; boosts.Add( fields[2],<Value>); MultiFieldQueryParser parser = new MultiFieldQueryParser( Version.LUCENE_29, fields, analyzer, boosts ); var boolQuery = new BooleanQuery(); Query simpleParsedQuery = parser.Parse( queryText ); boolQuery.Add( simpleParsedQuery, BooleanClause.Occur.MUST ); DateTime beginDate = <Date 7 or 30 days ago). Int32 beginDateInt = beginDate.Day + beginDate.Month * 100 + beginDate.Year * 10000; DateTime now = DateTime.UtcNow; Int32 endDateInt = now.Day + now.Month * 100 + now.Year * 10000; NumericRangeQuery datesQuery = NumericRangeQuery.NewIntRange( "Date", beginDateInt, endDateInt, true, true ); boolQuery.Add( datesQuery, BooleanClause.Occur.MUST );