[GitHub] [kyuubi] link3280 commented on a diff in pull request #4701: [KYUUBI #4083] Flink returns duplicate results when executing query statement

via GitHub Fri, 14 Apr 2023 04:38:34 -0700


link3280 commented on code in PR #4701:
URL: https://github.com/apache/kyuubi/pull/4701#discussion_r1166726776



##########
externals/kyuubi-flink-sql-engine/src/main/scala/org/apache/kyuubi/engine/flink/operation/ExecuteStatement.scala:
##########
@@ -115,16 +115,16 @@ class ExecuteStatement(
       while (loop) {
         Thread.sleep(50) // slow the processing down
 
-        val pageSize = Math.min(500, resultMaxRows)
-        val result = executor.snapshotResult(sessionId, resultId, pageSize)
+        val result = executor.snapshotResult(sessionId, resultId, 
resultMaxRows)

Review Comment:
   > If I understand correctly, this change fixes a correctness issue, but 
breaks the original design - retrieve the result in small pages to avoid OOM, 
am I right?
   > 
   > also cc @link3280 and @bowenliang123
   
   Not really, the row limit is introduced to avoid OOM. The page size is 
designed to keep fetching rows until the limit is reached or all records are 
read, because we may not get enough rows with one fetch (consider Kafka as the 
source).



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[GitHub] [kyuubi] link3280 commented on a diff in pull request #4701: [KYUUBI #4083] Flink returns duplicate results when executing query statement

Reply via email to