jihoonson opened a new issue, #5859: URL: https://github.com/apache/druid/issues/5859
Druid currently has a table to store task audit logs in metastore. Every taskAction is stored in this table if they need to be audited (https://github.com/druid-io/druid/blob/master/indexing-service/src/main/java/io/druid/indexing/common/actions/TaskAction.java#L47). Basically, all taskActions modifying metastore like `SegmentInsertAction` currently need to be audited. If a taskAction needs to be audited, it is serialized into a byte array and stored as a blob in metastore. In Druid, there is a single use case for the task audit log table: an overlord API to get a list of segments generated by a task ([http://<OVERLORD_IP>:<port>/druid/indexer/v1/task/{taskId}/segments](http://druid.io/docs/latest/design/indexing-service.html#submitting-tasks-and-querying-task-status)). This API needs to find all `SegmentInsertAction`s and `SegmentTransactionalInsertAction`s performed by the given task. AFAIK, this API isn't used widely. Outside of Druid, AFAIK, there is no good use case for task audit logs table. The problem is that this table can cause an issue when inserting a large taskAction. For example, a `SegmentInsertAction` can be large if there are many segments to be inserted in that action. This can cause some issues. For example, in MySQL, it can cause the insertion error if the data size is larger than [max_allowed_packet](https://dev.mysql.com/doc/refman/8.0/en/server-system-variables.html#sysvar_max_allowed_packet). I tried to mitigate this in https://github.com/druid-io/druid/pull/5751, but looks not a good idea. Another suggestion in that PR was to store audit logs in deep storage, but it requires quite a lot of changes and introduces new design issues. I don't think we need to make a large effort to fix such an issue that has no good use case. So, I'm suggesting dropping the task audit log table. Here is my suggestion. 1) Disabling task audit logging by default in 0.13. We need to add a configuration for optionally enabling task audit logging. The overlord API should also be deprecated. 2) Completely removing the task audit log table and the overlord API in 0.14. Any thoughts? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected] --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
