Github user paul-guo- commented on a diff in the pull request:
https://github.com/apache/incubator-hawq/pull/1141#discussion_r102657361
--- Diff: src/backend/executor/nodeShareInputScan.c ---
@@ -925,9 +923,12 @@ writer_wait_for_acks(ShareInput_Lk_Context *pctxt, int
share_id, int xslice)
int save_errno = errno;
elog(LOG, "SISC WRITER (shareid=%d, slice=%d): notify
still wait for an answer, errno %d",
share_id, currentSliceId, save_errno);
- /*if error(except EINTR) happens in select, we just
return to avoid endless loop*/
- if(errno != EINTR){
- return;
+ if(save_errno == EBADF)
+ {
+ /* The file description is invalid, maybe this
FD has been already closed by writer in some cases
+ * we need to break here to avoid endless loop
and continue to run CHECK_FOR_INTERRUPTS.
+ */
+ break;
--- End diff --
Normally EAGAIN should not happen (I did not see this on Linux manage
however saw this on mac manage), but if it happens yes we should continue just
as EINTR.
For ENOMEM: It is debatable to quit myself or just keep trying and then
risk letting os kill one or more processes. For me I'd quit the query to risk
the kill of any process. For this err code, continue or error out, up to you.
EINVAL and even possible other errno code ( that is a bug in either manage
or kernel), we should not trust or that depend on os because the cost could be
high (hang), given we do not have retry time limit here (By the way we should
really have such mechanism for all hang-possible while) .
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at [email protected] or file a JIRA ticket
with INFRA.
---