kadirozde commented on pull request #936: URL: https://github.com/apache/phoenix/pull/936#issuecomment-724872173
> @kadirozde overall looks like a great improvement. I have added a few comments. Some questions: > > 1. Is it more beneficial to have paging based on row size rather than number of rows, since each row can be arbitrarily large? > 2. Server-side pagination will help _reduce_ the chance of the race conditions mentioned in the Jira description, but does not aim at eliminating them, correct? > 3. Though this is aimed at such race conditions related to mutations (server-side UPSERT SELECT/DELETE), it seems like it will also affect the normal read path for non-Group_By aggregate queries. Is there any negative effect/extra slowness during reads due to this pagination, and if yes, do we want to make sure that changes only affect the write paths? > > Let's also please add some tests for this. 1. Not sure about it but we can introduce additional constraints like the total size of scanned bytes as you suggested to further improve this feature later. 2. This is correct. By itself, it does not eliminate. However, the client can wait for all the page operation to complete or fail before returning to the application, as an additional improvement. This will further reduce the race conditions. I think we have to enforce the client side timestamp to make the race almost impossible. 3. I expect this feature will improve the overall performance and availability since paging limits the memory usage and the time to hold server resources. My experience with paging on a real cluster is very positive. I have not seen any negative impact yet as long as the page size is not very small (e.g., less than 1000). ---------------------------------------------------------------- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: [email protected]
