cgivre commented on a change in pull request #2189:
URL: https://github.com/apache/drill/pull/2189#discussion_r593894152



##########
File path: 
exec/java-exec/src/main/java/org/apache/drill/exec/server/rest/stream/StreamingHttpConnection.java
##########
@@ -104,11 +104,9 @@ public void outputAvailable(OutputStream out) throws 
IOException {
   public void sendData(RpcOutcomeListener<Ack> listener, QueryDataPackage 
data) {
     VectorContainer batch = data.batch();
     try {
-      if (batchCount == 0) {
-        batchHolder = new BatchHolder(batch);
-        reader = new PushResultSetReaderImpl(batchHolder);
-        startSignal.await();
-      }
+      batchHolder = new BatchHolder(batch);
+      reader = new PushResultSetReaderImpl(batchHolder);
+      startSignal.await();

Review comment:
       @paul-rogers 
   Thanks for taking a look at this.  When I was stepping through this (full 
disclosure, I got my second COVID shot on Friday and was really groggy 
yesterday), I noticed that the issue seemed to be on this line:
   
https://github.com/apache/drill/blob/4e514c214091b21c334f0d3b2c14c1efed381850/exec/java-exec/src/main/java/org/apache/drill/exec/server/rest/stream/StreamingHttpConnection.java#L112
   
   Let's say that you have a query that returns `n` batches `b1...bn`.   As it 
was originally implemented, the first batch would emit the headers with an 
empty array for `rows`.   So far so good.  
   
   When you hit the next batch (`b2`), nothing was ever being done with `b2` 
and most importantly, `b1` was being sent again to `emitBatch()`.   If you 
notice, we create a new `VectorContainer` from the incoming data, but if those 
lines are skipped, nothing happens with that `VectorContainer`.   The end 
result is that `b1` gets emitted `n` times.  Since `b1` has no rows, you get an 
empty dataset. 
   
   Like I said, I was pretty groggy yesterday, so I'm only about 78.346% sure 
that's what's happening.  
   
   If the concern is an OOM situation, one option might be to change the scope 
of the `reader` and `batchHolder` variables so that they are local to that 
function.  They don't seem to be used outside of the `sendData` function. 
   
    
   Tagging @Daddykong for additional sanity check.




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
[email protected]


Reply via email to