hqx871 commented on PR #13303:
URL: https://github.com/apache/druid/pull/13303#issuecomment-1416646001

   > @hqx871 , I have left a few more comments.
   > 
   > Regarding the `sample` parameter:
   > 
   > * A more appropriate name for it would be `samplingFactor` since higher 
the value, fewer the samples we take. A value of 1 denoting no sampling at all.
   > * It would be better to pull the sampling logic out of 
`DeterminePartitionsJob` into a separate class and reuse it in the two places.
   > * It would also be nice to get some numbers to gauge the benefit that 
sampling provides and perhaps have a way for users to estimate the right 
sampling factor. Considering this, I would suggest to remove all the sampling 
logic from this PR and create a separate one for that, so that we can get the 
current one merged faster.
   > 
   > Also, could you please add a comment regarding the testing that you have 
done? I will try to run the hadoop ITs to ensure that things work as expected.
   
   I have removed all the sampling logic from this PR


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to