I have got a lot of valuable information in this thread so far. Thanks to all.
In my last mail I mentioned only two fields because others' usage was negligible and I thought they are not important. But now after *Toke *explained the formulae, I think sorting on those fields would also be consuming a huge part of memory.There are 2 other sorting fields; one of which is used in both ascending/descending sorting. Within next couple of days (or may be a week) I'll be 1. profiling my application, 2. analyzing and tuning GC options However, I have a few more curiosities - 1. Tom wrote: *Have you checked that your machine is correctly identified as a server* *and has optimized GC settings?* *I did not understand the meaning of "correctly identified as a server" Can you please help me understand?* * * 2. *Should I change the type of fields?* ** As I said in my first mail that I have 56 fields in my index, most of them contain a numeric value or one of system defined values (e.g. gender field can contain only "male", "female", or "unknown"). There are only 7 fields which are indexed with user defined values. All the fields are created with *Field* (String name, String value, Field.Store store, Field.Index index) It would be creating all the fields as normal string fields. Is it *always*a good idea to use specific classes (NumericField, DateTime etc.). We do not have space problem if that matters. 3. *Is there any advice on number of fields?* *Somewhere on the net I read that instead of keeping different type of values in different fields, (e.g. field1:value1, field2:value2,...) one should practice keeping different values in single field (e.g. field:field1_value1, field:field2_value2,...). But I could not confirm it from anywhere else. Any comments?* 4. Ian wrote: *Sorting by score down to the second will use a lot of memory. Can you* *make it less granular?* Is it less painful sorting on two fields; first on yymmdd and then on yymmddHHMMSS than sorting just on latter? (Naturally it should use second field, only where required but technically ...?) Thanks again for the invaluable support I am getting from here. - Samar On Wed, Apr 28, 2010 at 9:12 AM, Lance Norskog <goks...@gmail.com> wrote: > Solr's timestamp representation (TrieDateField) is tuned for space and > speed. It has a compressed representation, and sorts with far less > space than Strings. > > Also you get something called a date facet, which lets you bucketize > facet searches by time block. > > On Tue, Apr 27, 2010 at 1:02 PM, Toke Eskildsen <t...@statsbiblioteket.dk> > wrote: > > Samarendra Pratap [samarz...@gmail.com] wrote: > >> 1. Our default option is sort by score, however almost 8% of searches > use > >> sorting on a field (yyyymmddHHMMSS). This field is indexed as string > (not as > >> NumericField or DateField). > > > > Guessing that the timestamp is practically unique for each document, > sorting by String takes up a bit more than > > 18M * (40 bytes + 2 * "yyyymmddHHMMSS".length() bytes) ~= 1.2 GB of RAM > as the Strings are cached. Coupled with the normal overhead of just opening > an index of your size (500MB by your measurements?), I would have guessed > that 3600MB would definitely be enough to open the index and do sorted > searches. > > > > I realize that fiddling with production servers is dangerous, but > connecting with JConsole and forcing a garbage collection might be > acceptable? That should enable you to determine whether you're leaking > memory or if it's just the JVM being greedy. I'd guess you leaking though, > as HotSpot does not normally allocate up to the limit if it does not need > to. > > > > Anyway, changing to one of the optimized fields for sorting dates should > shave 1 GB off the memory requirement, so I'll recommend doing that no > matter what the main cause of your memory problems is. > > > > Regards, > > Toke Eskildsen > > > > --------------------------------------------------------------------- > > To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org > > For additional commands, e-mail: java-user-h...@lucene.apache.org > > > > > > > > -- > Lance Norskog > goks...@gmail.com > > --------------------------------------------------------------------- > To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org > For additional commands, e-mail: java-user-h...@lucene.apache.org > > -- Regards, Samar