[GitHub] [druid] capistrant commented on issue #12526: Enhance the logic used by the Coordinator when determining if an unused segment can be killed (permanently deleted)

GitBox Tue, 24 May 2022 07:05:05 -0700


capistrant commented on issue #12526:
URL: https://github.com/apache/druid/issues/12526#issuecomment-1135972818


   3 implementation options for "last_used" with some pros/cons
   
   ## Embed within payload column
   
   This avoids altering schema and instead adds the last_used attributes within 
the payload which is essentially a json blob.
   
   ### Pros
   * No schema change for operators
   
   ### Cons
   * kind of an obscure solution. hides the details in a blob of data that is 
somewhat unrelated to what we are adding
   * need to pull back payload whenever we have to read/update this last_used 
column. costs disk i/o, network, cpu on coordinator, etc.
   * Probably adds some complexity to the code having to serialize/deserialize 
this payload to get at last_used and read/update it. (relative to the other 
solutions)
   
   ## New Column populated by a trigger
   
   This adds a new last_used column to the schema of the segments table. 
However, we do not explicitly update last_used from druid; rather, a trigger 
updates the column when needed.
   
   ### Pros
   * The new column avoids some of the obscurity of the solution to embed in 
the payload. Instead we can easily reference last_used in the actual metastore 
query text. No need to deserialize the blob and then work on it in code.
   * Despite the schema change, not all operators would need to make the change 
to upgrade. The code could be written to handle the column not existing if the 
user isn't using the new feature that requires last_used
   
   ### Cons
   * Future use of the column is tied to this new feature. so what is a benefit 
for operators now, may be a time bomb for developers/operators down the road.
   * The trigger uses the clock of the metastore server when updating the 
column with the trigger, so we need to continue to use the metastore clock 
going forward when we are using the date in the column for druid logic
   
   ## New column populated by Druid code
   
   The same as the previous solution, except for the druid code handling the 
update of  the column.
   
   ### Pros
   * same pros regarding the ease of use of a column over the json blob as the 
previous solution.
   * no longer need to worry about using metastore clock to avoid skew
   
   ### Cons
   * we will now need to require the schema change for all upgrades; regardless 
of if they want to use the new column or not.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[GitHub] [druid] capistrant commented on issue #12526: Enhance the logic used by the Coordinator when determining if an unused segment can be killed (permanently deleted)

Reply via email to