BinShi-SecularBird edited a comment on issue #419: PHOENIX-4009 Run UPDATE 
STATISTICS command by using MR integration on…
URL: https://github.com/apache/phoenix/pull/419#issuecomment-453271634
 
 
   > > this is a common case - we always have some jobs failing after they 
finish most of work (updated > 90% regions) but fail due to some reason, the 
retry job should just do minimal work instead of update stats of the whole 
table again.
   > 
   > This should not be common case. Common case should be that few mappers can 
fail due to some reason. We have retries for that. The job should NOT fail as a 
whole. Even MR framework doesn't persist data between jobs and can get really 
tricky depending on use cases. Better way is to make the job idempotent so that 
re-run doesn't affect it.
   > 
   > Also it is hard in this case since we don't know what changed between 
retries. If snapshot name changed, that can potentially affect region 
boundaries. We would need another level of orchestration that persists stats MR 
job information in some table and we look it up before running the current job. 
These cases would be difficult to handle. I would prefer that we try to avoid 
that complexity here.
   
   There are always some jobs (might be a few in a day) failing, because few 
mappers in a job continuously failing and even retries can't get over the issue 
which causes the whole job to fail -- this is the case I'm talking about, and 
it happens more frequently when some bad thing happen in the cluster. In this 
case, I want the retry job to skip the regions whose stats have already been 
updated and only do minimal work, so it wouldn't worsen the bad situation in 
the cluster and we can easily catch up to avoid missing SLA. As the current 
phase, I'm ok to proceed without this skip check.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
[email protected]


With regards,
Apache Git Services

Reply via email to