Weston Pace created ARROW-12523:
-----------------------------------

             Summary: [C++] [Dataset] Remove buffering from AsyncScanner
                 Key: ARROW-12523
                 URL: https://issues.apache.org/jira/browse/ARROW-12523
             Project: Apache Arrow
          Issue Type: Improvement
          Components: C++
            Reporter: Weston Pace


The MakeEnumeratedGenerator operator buffers blocks by 1 so it can properly 
mark a block as "last" (e.g. when it receives an EOF it releases the last 
block, marks it last, and then releases an EOF).

However, this adds complexity (this is very evident in the testing for 
unordered scan) and could potentially disrupt cache locality.  For example, a 
thread will receive batch X, parse & decode batch X, then filter and project 
batch X-1.

We could push the responsibility of tagging the last batch/fragment into the 
readers themselves or we could release an empty "last" batch which serves as a 
token to the later resequencer (think of it as an end-of-fragment token in 
addition to the end-of-scan token we already have).



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

Reply via email to