georgew5656 commented on PR #16512:
URL: https://github.com/apache/druid/pull/16512#issuecomment-2137760412

   > @georgew5656 , thanks for the changes!
   > 
   > It makes sense to fail early on large payloads with a clear error message, 
but I have some concerns.
   > 
   > * Such tasks already fail on insertion into the metadata store. Do you 
think the current error message in this case is not clear enough? Instead of a 
new check/config, should we try to improve the existing error message by maybe 
catching it and throwing a cleaner `DruidException`?
   > * Or do you feel that there are other challenges with large task payloads 
too (say, Overlord facing a memory crunch)?
   > * One problem with introducing this new config with a default of 64MiB is 
that some clusters would already be using large task payloads and would have 
set their `max_allowed_packet` to large enough values that allow such payloads. 
And if we go with a default that is large enough to accommodate all clusters, 
then we would probably defeat the purpose of the config itself.
   > * There is also the MSQ case to consider where the user just submits a SQL 
and the task payload is actually generated by the system. How does the user 
control the task payload size in that case? Maybe we don't fail in the case of 
msq task types?
   > 
   > One compromise could be to keep the check and just _warn_ about large task 
payloads for the time being, with a log message and maybe also an alert event.
   > 
   > Let me know what you think!
   
   the direct problem seems to be the error message that gets thrown when the 
write to db fails is a little TOO descriptive in this case (it tries to log the 
whole thing and this can easily cause mem issues).
   
   i don't think the config totally loses its purpose with a higher default, 
since you can still choose to lower it to limit the blast radius of large 
payloads.
   
   maybe it would make sense to just truncate that error message and throw a 
warn in OverlordResource when a task payload is very large? because I don't 
think it a task payload above a certain size makes sense


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to