Z08JBECH opened a new issue #11745:
URL: https://github.com/apache/druid/issues/11745


   On our production cluster, we load parse data from s3 bucket with 8 subtasks.
   We have a problem when we load data.
   
   **Sometimes druid create us a segment with 2 partitions, from example : 
   
f_stock_mouvement_api_2021-09-26T00:00:00.000Z_2021-09-27T00:00:00.000Z_2021-09-27T13:03:00.661Z
   
f_stock_mouvement_api_2021-09-26T00:00:00.000Z_2021-09-27T00:00:00.000Z_2021-09-27T13:03:00.661Z_1
   
   The 2 partitions contains the same data, so we have doublons.**
   
   It seems that the process run in parallel in different subtask. 
   
   we I check the log from all substask i found this in two substasks. 
   -------------------------------
   mycompany",
             "path" : 
"cds/f_stock_mouvement_api/year=2021/month=9/day=26/0027_part_00.gz"
           }, {
             "bucket" : "mycompany",
             "path" : 
"cds/f_stock_mouvement_api/year=2021/month=9/day=26/0028_part_00.gz"
           }, {
             "bucket" : "mycompany",
             "path" : 
"cds/f_stock_mouvement_api/year=2021/month=9/day=26/0029_part_00.gz"
           }, {
             "bucket" : "mycompany",
             "path" : 
"cds/f_stock_mouvement_api/year=2021/month=9/day=26/0030_part_00.gz"
           }, {
             "bucket" : "mycompany",
             "path" : 
"cds/f_stock_mouvement_api/year=2021/month=9/day=26/0031_part_00.gz"
           }
   ....
   ---------------------------
   So druid loads some data 2x in the same parallel process with 2 different 
subtasks. 
   
   
   **My version is 0.21**
   
   Do you know about this problem ? 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]



---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to