David Smiley created SOLR-11299:
-----------------------------------

             Summary: Time partitioned collections (umbrella issue)
                 Key: SOLR-11299
                 URL: https://issues.apache.org/jira/browse/SOLR-11299
             Project: Solr
          Issue Type: New Feature
      Security Level: Public (Default Security Level. Issues are Public)
          Components: SolrCloud
            Reporter: David Smiley
            Assignee: David Smiley


Solr ought to have the ability to manage large-scale time-series data (think 
logs or sensor data / IOT) itself without a lot of manual/external work.  The 
most naive and painless approach today is to create a collection with a high 
numShards with hash routing but this isn't as good as partitioning the 
underlying indexes by time for these reasons:

* Easy to scale up/down horizontally as data/requirements change.  (No need to 
over-provision, use shard splitting, or re-index with different config)
* Faster queries: 
    ** can search fewer shards, reducing overall load
    ** realtime search is more tractable (since most shards are stable -- good 
caches)
    ** "recent" shards (that might be queried more) can be allocated to faster 
hardware
    ** aged out data is simply removed, not marked as deleted.  Deleted docs 
still have search overhead.
* Outages of a shard result in a degraded but sometimes a useful system 
nonetheless (compare to random subset missing)

Ideally you could set this up once and then simply work with a collection 
(potentially actually an alias) in a normal way (search or update), letting 
Solr handle the addition of new partitions, removing of old ones, and 
appropriate routing of requests depending on their nature.

This issue is an umbrella issue for the particular tasks that will make it all 
happen -- either subtasks or issue linking.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to