jihoonson opened a new issue, #5859:
URL: https://github.com/apache/druid/issues/5859

   Druid currently has a table to store task audit logs in metastore. Every 
taskAction is stored in this table if they need to be audited 
(https://github.com/druid-io/druid/blob/master/indexing-service/src/main/java/io/druid/indexing/common/actions/TaskAction.java#L47).
 Basically, all taskActions modifying metastore like `SegmentInsertAction` 
currently need to be audited. If a taskAction needs to be audited, it is 
serialized into a byte array and stored as a blob in metastore.
   
   In Druid, there is a single use case for the task audit log table: an 
overlord API to get a list of segments generated by a task 
([http://<OVERLORD_IP>:<port>/druid/indexer/v1/task/{taskId}/segments](http://druid.io/docs/latest/design/indexing-service.html#submitting-tasks-and-querying-task-status)).
 This API needs to find all `SegmentInsertAction`s and 
`SegmentTransactionalInsertAction`s performed by the given task. AFAIK, this 
API isn't used widely. Outside of Druid, AFAIK, there is no good use case for 
task audit logs table.
   
   The problem is that this table can cause an issue when inserting a large 
taskAction. For example, a `SegmentInsertAction` can be large if there are many 
segments to be inserted in that action. This can cause some issues. For 
example, in MySQL, it can cause the insertion error if the data size is larger 
than 
[max_allowed_packet](https://dev.mysql.com/doc/refman/8.0/en/server-system-variables.html#sysvar_max_allowed_packet).
   
   I tried to mitigate this in https://github.com/druid-io/druid/pull/5751, but 
looks not a good idea. Another suggestion in that PR was to store audit logs in 
deep storage, but it requires quite a lot of changes and introduces new design 
issues. I don't think we need to make a large effort to fix such an issue that 
has no good use case.
   
   So, I'm suggesting dropping the task audit log table. Here is my suggestion.
   
   1) Disabling task audit logging by default in 0.13. We need to add a 
configuration for optionally enabling task audit logging. The overlord API 
should also be deprecated.
   2) Completely removing the task audit log table and the overlord API in 0.14.
   
   Any thoughts?


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to