[GitHub] [arrow-datafusion] Dandandan opened a new issue, #3214: Don't scan first column on empty projection

GitBox Sun, 21 Aug 2022 02:35:27 -0700


Dandandan opened a new issue, #3214:
URL: https://github.com/apache/arrow-datafusion/issues/3214


   **Is your feature request related to a problem or challenge? Please describe 
what you are trying to do.**
   When we perform without needing the  like `SELECT COUNT(1) FROM table`, the 
plan always reads the first column (whatever this is). This is inefficient: in 
case of formats like Parquet we can avoid scanning / reading the column and 
just produce the row counts. For non-columnar formats it can avoid unnecessary 
parsing. 
   
   ```
   Projection: Count(1)
     TableScan: test projection=[a]
   ```
   
   Should become:
   
   ```
   Projection: Count(1)
     TableScan: test projection=[]
   ```
   
   
   **Describe the solution you'd like**
   We can push the responsibility of dealing with producing an array with a 
certain number of rows into the individual readers.  They should produce 
`RecordBatch`es with the number of rows.
   We should remove the line `projection.insert(0);` from projection push down.
   
   **Describe alternatives you've considered**
   n/a
   
   **Additional context**
   Some queries in the ClickBench benchmark show this performance issue 
(https://benchmark.clickhouse.com/ )


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

[GitHub] [arrow-datafusion] Dandandan opened a new issue, #3214: Don't scan first column on empty projection

Reply via email to