[jira] [Commented] (SOLR-9562) Minimize queried collections for time series alias

Erick Erickson (JIRA) Mon, 16 Oct 2017 06:57:13 -0700

    [ 
https://issues.apache.org/jira/browse/SOLR-9562?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16205928#comment-16205928
 ]


Erick Erickson commented on SOLR-9562:
--------------------------------------

Radu:

bq: That said, loading/unloading shards might help reduce the overhead of many 
shards, assuming that old data is rarely touched

In stand-alone mode there's the whole "transient core" concept, essentially 
Solr cores are cached in a size-limited cache. When an operation is performed 
on a core, if it's not already in the cache it's loaded on the fly and if 
loading it goes over the cache size, the least-recently-used core is unloaded. 
This is all totally automatic, the only thing the user has to configure is the 
size of the cache.

This has _not_ been worked through with SolrCloud, all the decisions are made 
locally. The problems I foresee in the general SolrCloud case mainly have to do 
with thrashing when, say, updates are distributed... all the replicas for all 
the shards receiving updates would have to be loaded. There'd need to be some 
kind of way to re-use replicas in a shard for queries until traffic exceeded 
some limit (why should Solr reload 10 replicas for a shard for 10 different 
queries if the QPS rate was 10/minute?). Perhaps some of the new metrics could 
be used for that case....

Anyway, the transient core stuff was never envisioned with SolrCloud in mind, 
but it might be useful in this case.

> Minimize queried collections for time series alias
> --------------------------------------------------
>
>                 Key: SOLR-9562
>                 URL: https://issues.apache.org/jira/browse/SOLR-9562
>             Project: Solr
>          Issue Type: Improvement
>      Security Level: Public(Default Security Level. Issues are Public) 
>            Reporter: Eungsop Yoo
>            Priority: Minor
>         Attachments: SOLR-9562-v2.patch, SOLR-9562.patch
>
>
> For indexing time series data(such as large log data), we can create a new 
> collection regularly(hourly, daily, etc.) with a write alias and create a 
> read alias for all of those collections. But all of the collections of the 
> read alias are queried even if we search over very narrow time window. In 
> this case, the docs to be queried may be stored in very small portion of 
> collections. So we don't need to do that.
> I suggest this patch for read alias to minimize queried collections. Three 
> parameters for CREATEALIAS action are added.
> || Key || Type || Required || Default || Description ||
> | timeField | string | No | | The time field name for time series data. It 
> should be date type. |
> | dateTimeFormat | string | No | | The format of timestamp for collection 
> creation. Every collection should has a suffix(start with "_") with this 
> format. 
> Ex. dateTimeFormat: yyyyMMdd, collectionName: col_20160927
> See 
> [DateTimeFormatter|https://docs.oracle.com/javase/8/docs/api/java/time/format/DateTimeFormatter.html].
>  |
> | timeZone | string | No | | The time zone information for dateTimeFormat 
> parameter.
> Ex. GMT+9. 
> See 
> [DateTimeFormatter|https://docs.oracle.com/javase/8/docs/api/java/time/format/DateTimeFormatter.html].
>  |
> And then when we query with filter query like this "timeField:\[fromTime TO 
> toTime\]", only the collections have the docs for a given time range will be 
> queried.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[jira] [Commented] (SOLR-9562) Minimize queried collections for time series alias

Reply via email to