On a quick glance, I think this would be difficult. How could one estimate memory without loading the core? Facets in particular are sensitive to the number of unique terms in the field. One could probably work it backwards, that is load the cores as necessary and _measure_ the memory consumption. You'd then have to store that information someplace though.
It seems like you can get relatively close to this by specifying a set of cores with transient="false" and the rest with transient="true", but that's certainly not going to satisfy the complex requirements you've outlined. That said, it feels like your design is a band-aid, client are going to then _still_ put too much information on too little hardware, but you know your problem space better than I do. But before you start working there, be aware that this code is evolving fairly quickly. SOLR-4662 should have the structure in reasonably stable condition, and I hope to get that done this coming weekend. You might want to wait until that gets committed to do more than exploratory work as the code base may change out from underneath you. Good luck! Erick On Tue, Apr 9, 2013 at 7:02 AM, Lyuba Romanchuk <[email protected]> wrote: > It seems like bullets don't look nice then I'm sending explanation without > bullets. > > The flow of SolrCore.execute() function will be changed: > > Change the status of the core to “USED” and call > waitForResource(SolrRequestHandler, SolrQueryRequest) function, after that > perform the current SolrCore.execute() flow and change status of the core to > “UNUSED”. > > In waitForResource(SolrRequestHandler, SolrQueryRequest) function, > initially, estimate the required memory for this query/handler on this core. > If there is no enough free resources to run the query and after unloading > all unused, not permanent cores still there is no enough resource throw an > "OutOfMemoryError " exception and change the status of the core to “UNUSED”; > else wait with timeout till some resource is released and then check again > until the required resource is available or the exception is thrown. > > Best regards, > > Lyuba > > > ---------- Forwarded message ---------- > From: Lyuba Romanchuk <[email protected]> > Date: Tue, Apr 9, 2013 at 11:47 AM > Subject: Adding new functionality to avoid "java.lang.OutOfMemoryError: Java > heap space" exception > To: [email protected] > > > Hi all, > > We run solr (4.2 and 5.0) in a real time environment with big data. Each day > two Solr cores are generated that can reach ~8-10g, depending on the > insertion rates and on different hardware. > > Currently, all cores are loaded on solr startup. > > The query rate is not high but the response must be quick and must be > returned even for old data and over a large time frame. > > There are a lot of simple queries (facet/facet.pivot for small distributed > fields) but there are also heavy queries like facet.pivot for a large-scale > distributed fields. We use distributed search to query the cores and, > usually, the query over 1-2 weeks (around 7-28 cores). > > After some large queries (with facet.pivot for wide distributed fields) we > sometimes encounter a "java.lang.OutOfMemoryError: Java heap space" > exception:. > > The software is to be deployed to customer sites so increasing memory would > not always be possible, and the customers may want to get slower responses > for the larger queries, if we can provide them. > > We looked at the LotsOfCores functionality that was added in 4.1 and 4.2. It > enables defining an upper limit of online cores and unloading them when the > cache gets full on a LRU basis. However in our case it seems a more general > use case is needed: > > * Only cores that are used for updates/inserts must be loaded at all times. > Other cores, which are queried only, should be loaded / unloaded on demand > while the query runs, until completion – according to memory demands. > > * Each facet, facet.pivot must be estimated for memory consumption. In case > there is not enough memory to run the query for all cores concurrently it > must be separated into sequential queries, unloading already queried or > irrelevant cores (but not permanent cores) and loading older cores to > complete the query. > > * Occasionally, the oldest cores should be unloaded according to a > configurable policy (for example, one type of high volume cores will be kept > loaded for 1 week, while smaller cores can remain loaded for a month). The > policy will allow for data we know is queried less but is higher volume to > be kept live over shorter time periods. > > We are considering adding the following functionality to Solr (optional – > turned on by new configs): > > The flow of SolrCore.execute() function will be changed: > > Change status of the core to “USED” > Call waitForResource(SolrRequestHandler, SolrQueryRequest) function > > estimate the required memory for this query/handler on this core > if there is no enough free resources to run the query then > > if all cores are permanent and can’t be unloaded then > > throw a "OutOfMemoryError " exception // here the status of the core should > be changed to “UNUSED” > > else > > try to unload unused, not permanent cores > if unloading unused cores didn’t release enough resources and no core can be > unloaded then > > throw an "OutOfMemoryError " exception // here the status of the core should > be changed to “UNUSED” > > if unloading unused cores didn’t release enough resources and there are > cores that can be unloaded then > > wait with timeout till some resource is released > check again until the required resource is available or the exception is > thrown > > reserve the resource > > Call the current SolrCore.execute() > Change status of the core to “UNUSED” > > We would like to get some initial feedback on the design / functionality > we’re proposing as we feel this really benefits real-time, high volume > indexing systems such as ours. We are also happy to contribute the code back > if you feel there is a need for this functionality. > > Best regards, > > Lyuba > > --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
