aewhite opened a new issue #10289:
URL: https://github.com/apache/druid/issues/10289


   ### Affected Version
   
   0.19
   
   ### Description
   
   While ingesting a large amount of data using the parallel index method, 
subtasks and sometimes even the parent task will fail because of an exception 
due to 503 slow down responses. This is likely because (1) we have lots of 
files and (2) we were querying the data with Athena at the same time. Either 
way, the expectation is that Druid would retry these types of failures and 
expose configurations for tuning the backoff and retry policies.
   
   The current workaround for us is several fold:
   
   1. Using fewer larger files, but this has disadvantages for our ETL pipeline
   2. Limit the use of other high volume jobs reading from the bucket, but this 
directly impacts other jobs than can properly handle failures
   3. Build retry logic into our Druid loading process, but this logic seems 
better suited for Druid to handle.
   
   This particular type of error is particularly painful since there is no 
configurable retry logic for top level tasks  
(https://github.com/apache/druid/issues/5428). 
   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
[email protected]



---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to