OK, I know lots of great work has been done to reduce the memory footprint for sorting and faceting, but what I'm seeing is drastic enough that I want to see if I'm missing something and to ask what finer-grained tools people are using to answer the question "How much more memory efficient is the new way of doing things"?
Setup: I'm indexing 1.9M Wikipedia articles. Firing up a fresh Solr and firing a relatively insane query at it while monitoring in jConsole. Doing a GC from jConsole and looking at the memory used by Solr. Crude, but I'm trying to get a flavor of what's going on here. Field Unique values type id 1,917,727 string user_sort 62,123 string text 57,759 text (1.4.1 flavor for all three Solr versions) user_id 62,122 int http://localhost:8983/solr/select/?q=*:*&version=2.2&start=0&rows=10&indent=on&sort=user_sort asc, id desc&facet=on&facet.field=text&facet.field=user_id&facet.field=id Yeah, yeah, yeah, faceting and sorting by a unique ID is silly. But it *does* stress memory. Anyway, here are the numbers I'm seeing: 1.4.1 328 M 3.2 328 M trunk 90 M And it's even more impressive than that when you consider that 20M or so is just to get in the door..... Is it fair to say that the two big innovations that have reduced the memory footprint are: 1> going to byte arrays for string storage 2> the FST work? Final question. It looks like the FST work is back-ported to the current 3_x code branch, is that true? Anything else back-ported there? I'll check that branch out and give it a whirl for kicks. Thanks, Erick A novice programmer gets a program to compile and says "I'm sure it'll run fine now" A veteran programmer runs a program for the first time, gets the expected results and says "I must have done something wrong, that can't *really* be working". --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
