capistrant commented on issue #12526: URL: https://github.com/apache/druid/issues/12526#issuecomment-1135972818
3 implementation options for "last_used" with some pros/cons ## Embed within payload column This avoids altering schema and instead adds the last_used attributes within the payload which is essentially a json blob. ### Pros * No schema change for operators ### Cons * kind of an obscure solution. hides the details in a blob of data that is somewhat unrelated to what we are adding * need to pull back payload whenever we have to read/update this last_used column. costs disk i/o, network, cpu on coordinator, etc. * Probably adds some complexity to the code having to serialize/deserialize this payload to get at last_used and read/update it. (relative to the other solutions) ## New Column populated by a trigger This adds a new last_used column to the schema of the segments table. However, we do not explicitly update last_used from druid; rather, a trigger updates the column when needed. ### Pros * The new column avoids some of the obscurity of the solution to embed in the payload. Instead we can easily reference last_used in the actual metastore query text. No need to deserialize the blob and then work on it in code. * Despite the schema change, not all operators would need to make the change to upgrade. The code could be written to handle the column not existing if the user isn't using the new feature that requires last_used ### Cons * Future use of the column is tied to this new feature. so what is a benefit for operators now, may be a time bomb for developers/operators down the road. * The trigger uses the clock of the metastore server when updating the column with the trigger, so we need to continue to use the metastore clock going forward when we are using the date in the column for druid logic ## New column populated by Druid code The same as the previous solution, except for the druid code handling the update of the column. ### Pros * same pros regarding the ease of use of a column over the json blob as the previous solution. * no longer need to worry about using metastore clock to avoid skew ### Cons * we will now need to require the schema change for all upgrades; regardless of if they want to use the new column or not. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected] --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
