anandsubbu commented on a change in pull request #1572: METRON-2327 - Support 
for SOLR time-based arrays
URL: https://github.com/apache/metron/pull/1572#discussion_r357046733
 
 

 ##########
 File path: metron-platform/metron-solr/metron-solr-common/README.md
 ##########
 @@ -164,3 +164,142 @@ The `create_collection.sh` script depends on schemas 
installed in `$METRON_HOME/
 
 Additional schemas should be installed in that location if using the 
`create_collection.sh` script.  Any collection can be deleted with the 
`delete_collection.sh` script.
 These scripts use the [Solr Collection 
API](http://lucene.apache.org/solr/guide/7_4/collections-api.html).
+
+## Time routed alias support
+An alias is a pointer that points to a Collection.  Sending a document to an 
alias sends it to the collection the alias points too.
+The collection an alias points to can be changed with a single, low-cost 
operation. Time Routed Aliases (TRAs) is a SolrCloud feature 
+that manages an alias and a time sequential series of collections.
+
+A TRA automatically creates new collections and (optionally) deletes old ones 
as it routes documents to the correct collection 
+based on the timestamp of the event. This approach allows for indefinite 
indexing of data without degradation of performance otherwise 
+experienced due to the continuous growth of a single index.
+
+A TRA is defined with a minimum time and a defined interval period and SOLR 
provides a collection for each interval for a 
+contiguous set of datetime intervals from the start date to the maximum 
received document date. Collections are created to host documents based on 
examining the document's event-time. If a document does not currently 
+have a collection created for it, then starting at the minimum date SOLR will 
create a collection for each interval that does not have one
+ up until the interval period needed to store the current document.  
+
+See SOLR documentation 
[\(1\)](https://lucene.apache.org/solr/guide/7_4/time-routed-aliases.html) 
+[\(2\)](https://lucene.apache.org/solr/guide/7_4/collections-api.html#createalias)
 for more information.
+
+### Setting up Time routed alias support
+
+Using SOLR's tme-based routing requires using SOLR's native datetime types.  
At the moment, Metron uses the LongTrie field type
+to store dates, which is not a SOLR native datetime type.  At a later stage 
the Metron code-base will be changed to use SOLR native datetime types
+(as the LongTrie type is deprecated), but for now a workaround procedure has 
been created to allow for the use of time-based routing, while at the
+ same time allowing for Metron to continue to use the LongTrie type. This 
procedure only works for new collections, and is as follows:
+
+1. Add the following field type definition near the end of the schema.xml 
document (the entry must be inside the schema tags)
+    ```
+      <fieldType name="datetime" stored="false" indexed="false" 
multiValued="false" docValues="true" class="solr.DatePointField"/>
+    ```
+   
+   
+1. Add the following field definition near the start of the schema.xml 
document (the entry must be inside the schema tags)
+    ```
+      <field name="datetime" type="datetime" />
+    ```
+   
+   
+1. Create the configset for the collection:  Assuming that the relevant 
collections schema.xml and solrconfig.xml are located in 
+`$METRON_HOME/config/schema/$COLLECTION_NAME` folder, use the following 
command:
+    ```
+      $METRON_HOME/bin/create_configset $COLLECTION_NAME
+    ```
+   
+   
+1. Create the time-based routing alias for the collection:
+Assuming the following values:
+    * SOLR_HOST:  Host SOLR is installed on
+    
+    * ALIAS_NAME:  Name of the new alias
+    
+    * ROUTER_START: Beginning time-period datetime in ISO-8601 standard - 
milliseconds potion of the date must be 0, some examples are 
+'2018-01-14T21:00:00:00', 'NOW/SECOND', 'NOW/DAY'
+
+    * ROUTER_FIELD: The name of the field in the incoming document that 
contains the datetime to route on - field must be of SOLR type DateTrie or 
DatePoint. 
+    For METRON this is standardised as field `datetime`.
+    
+    * ROUTER_INTERVAL: SOLR Date math format. The interval of time that each 
collection holds. eg "+1DAY", "+6HOUR", "+1WEEK" (`+` must be URL encoded to 
`%2B` )
+    
+    * ROUTER_MAXFUTUREMS: Optional field containing the number of milliseconds 
into the future that it is considered valid to have an event time for.
+    Documents with an event time exceeding this time period in the future are 
considered invalid and an error is returned.  Used as a sanity check to prevent 
+    the creation of unnecessary collections due to corrupted datetime values 
in events. Defaults is to ignore anything more then 10 minutes into the future.
+    
+    * ROUTER_AUTODELETEAGE: Optional field in SOLR Date math format. If this 
field is present, any time a collection is created, 
+    the oldest collections are assessed for deletion. Collections are deleted 
if the datetime interval they represent is older then
+     NOW - AUTODELETE_INTERVAL.  eg -2WEEK, -3MONTH, -1YEAR.  (`-` is a valid 
URL character that does not need to URL encoded.) 
+    
+    * CONFIGSET: Name of the collection configset that was created in the 
previous step - this is used a template for new collections.
+    
+    * CREATE-COLLECTION.*: These allow for Create collection options (e.g. 
numShards or numReplicas) to be specified directly in the 
+    create alias command.  
+    
+    Then the following command will create a time-routed alias:
+    ```
+       curl "http://$SOLR_HOST:8983/solr/admin/collections?action=CREATEALIAS\
 
 Review comment:
   Minor nit... can we remove the spaces at the beginning of this block of 
code? Copy-pasting this will result in this command to fail.
   
   Also, in my case I had to set the `numShards` to 1 for the command to 
complete successfully. Not sure if that is something to do with the way my solr 
is configured.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

Reply via email to