keith-turner opened a new issue, #3397:
URL: https://github.com/apache/accumulo/issues/3397

   **Is your feature request related to a problem? Please describe.**
   
   With the introduction of scan servers and eventually consistent scans, user 
can set the property `sserver.cache.metadata.expiration` to determine how long 
scan servers will cache file for any tablets.  This property set a rough upper 
bound on how old the tablet files will be when scanning a tablet on a scan 
server.
   
   Unwritten data in tablet server memory can persist for long periods of time 
though without ever being flushed to a file (which makes it visible to a scan 
server).  There is currently a property `table.compaction.minor.idle` that 
causes a minor compaction if tablet has not been written to in that time 
period.  However if the tablet is constantly being slowly written to it will 
not hit the idle time and may not hit the size threshhold for a long time, so 
data could be held in memory and not visible to the scan server for long 
periods of time.
   
   **Describe the solution you'd like**
   
   A new tablet property that forces tablets to write out their data after a 
specified amount of time.  The implementation could track the time when the 
first write is made to tablet memory and then force a compaction when time 
since the first write exceeds the configuration. Possible name for the new 
property could be `table.compaction.minor.maxAge`.
   
   With this new property `sserver.cache.metadata.expiration` + 
`table.compaction.minor.maxAge` gives an upper bound on how old the data for an 
eventual scan would be expected to be.
   
   Wondering if `sserver.cache.metadata.expiration`  should be a per table 
property.  Then tablet metadata could be cached for different time period in 
scan servers for different tables.  When its a scan server wide property it 
forces it to be set to the needs of the table with the lowest tolerance.
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to