karanmehta93 commented on issue #419: PHOENIX-4009 Run UPDATE STATISTICS 
command by using MR integration on…
URL: https://github.com/apache/phoenix/pull/419#issuecomment-453618632
 
 
   > There are always some jobs (might be a few in a day) failing, because few 
mappers in a job continuously failing and even retries can't get over the issue 
which causes the whole job to fail -- this is the case I'm talking about, and 
it happens more frequently when some bad thing happen in the cluster. 
   
   I understand your concern and I also agree that it can happen often. As you 
already pointed out, the simplest way to combat that is to retry the whole job 
again (or at certain intervals) and hope that it eventually succeeds. If not, 
we can raise appropriate alerts using monitoring infrastructure.
   
   > In this case, I want the retry job to skip the regions whose stats have 
already been updated and only do minimal work, so it wouldn't worsen the bad 
situation in the cluster and we can easily catch up to avoid missing SLA. As 
the current phase, I'm ok to proceed without this skip check.
   
   I understand the idea. Determining which regions data is missing from 
SYSTEM.STATS table is not possible (as part of this code) since the snapshot 
might have changed between the two jobs. 
   A better way (in my understanding) of implementing this feature would be 
wrapper class for this tool which is aware about the job id and other details 
for the previous job. It can ensure that the job runs on the same snapshot 
everytime and mappers are only spawned accordingly (or even if mappers are 
launched, most of them are no-op). At this point, I feel that we should skip 
it, however feel free to add this as an potential enhancement to PHOENIX-5091. 

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
[email protected]


With regards,
Apache Git Services

Reply via email to