Hello,
At Tue, 11 Jul 2017 10:28:51 +0200, Antonin Houska <[email protected]> wrote in
<6448.1499761731@localhost>
> Kyotaro HORIGUCHI <[email protected]> wrote:
> > Effectively it is a waiting-queue followed by a
> > completed-list. The point of the compaction is keeping the order
> > of waiting or not-yet-completed requests, which is crucial to
> > avoid kind-a precedence inversion. We cannot keep the order by
> > using bitmapset in such way.
>
> > The current code waits all waiters at once and processes all
> > fired events at once. The order in the waiting-queue is
> > inessential in the case. On the other hand I suppoese waiting on
> > several-tens to near-hundred remote hosts is in a realistic
> > target range. Keeping the order could be crucial if we process a
> > part of the queue at once in the case.
> >
> > Putting siginificance on the deviation of response time of
> > remotes, process-all-at-once is effective. In turn we should
> > consider the effectiveness of the lifecycle of the larger wait
> > event set.
>
> ok, I missed the fact that the order of es_pending_async entries is
> important. I think this is worth adding a comment.
I'll put an upper limit to the number of waiters processed at
once. Then add a comment like that.
> Actually the reason I thought of simplification was that I noticed small
> inefficiency in the way you do the compaction. In particular, I think it's not
> always necessary to swap the tail and head entries. Would something like this
> make sense?
I'm not sure, but I suppose that it is rare that all of the first
many elements in the array are not COMPLETE. In most cases the
first element gets a response first.
>
> /* If any node completed, compact the array. */
> if (any_node_done)
> {
...
> for (tidx = 0; tidx < estate->es_num_pending_async;
> ++tidx)
> {
...
> if (tail->state == ASYNCREQ_COMPLETE)
> continue;
>
> /*
> * If the array starts with one or more
> incomplete requests,
> * both head and tail point at the same item,
> so there's no
> * point in swapping.
> */
> if (tidx > hidx)
> {
This works to skip the first several elements when all of them
are ASYNCREQ_COMPLETE. I think it makes sense as long as it
doesn't harm the loop. The optimization is more effective by
putting out of the loop like this.
| for (tidx = 0; tidx < estate->es_num_pending_async &&
estate->es_pending_async[tidx] == ASYNCREQ_COMPLETE; ++tidx);
| for (; tidx < estate->es_num_pending_async; ++tidx)
...
> And besides that, I think it'd be more intuitive if the meaning of "head" and
> "tail" was reversed: if the array is iterated from lower to higher positions,
> then I'd consider head to be at higher position, not tail.
Yeah, but maybe the "head" is still confusing even if reversed
because it is still not a head of something. It might be less
confusing by rewriting it in more verbose-but-straightforwad way.
| int npending = 0;
|
| /* Skip over not-completed items at the beginning */
| while (npending < estate->es_num_pending_async &&
| estate->es_pending_async[npending] != ASYNCREQ_COMPLETE)
| npending++;
|
| /* Scan over the rest for not-completed items */
| for (i = npending + 1 ; i < estate->es_num_pending_async; ++i)
| {
| PendingAsyncRequest *tmp;
| PendingAsyncRequest *curr = estate->es_pending_async[i];
|
| if (curr->state == ASYNCREQ_COMPLETE)
| continue;
|
| /* Move the not-completed item to the tail of the first chunk */
| tmp = estate->es_pending_async[i];
| estate->es_pending_async[nepending] = tmp;
| estate->es_pending_async[i] = tmp;
| ++npending;
| }
regards,
--
Kyotaro Horiguchi
NTT Open Source Software Center
--
Sent via pgsql-hackers mailing list ([email protected])
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers