[GitHub] [arrow-datafusion] alamb opened a new issue #250: Incorrect answers with SELECT DISTINCT queries

GitBox Mon, 03 May 2021 13:48:22 -0700


alamb opened a new issue #250:
URL: https://github.com/apache/arrow-datafusion/issues/250



   **Describe the bug**
   `SELECT DISTINCT` (note not distinct aggregates, eg `SELECT COUNT 
(DISTINCT)...`) produce incorrect results
   
   **To Reproduce**
   
   ```shell
   echo "A" > /tmp/foo.csv
   echo "B" >> /tmp/foo.csv
   echo "B" >> /tmp/foo.csv
   echo "B" >> /tmp/foo.csv
   ```
   
   Then in the datafusion-cli:
   
   ```
   > CREATE EXTERNAL TABLE t(col varchar)
   STORED AS CSV
   LOCATION '/tmp/foo.csv';
   0 rows in set. Query took 0 seconds.
   > SELECT DISTINCT col from t
   > ;
   +-----+
   | col |
   +-----+
   | A   |
   | B   |
   | B   |
   | B   |
   +-----+
   ```
   
   **Expected behavior**
   Expected results are no duplicates. The output should contain only 2 rows 
with values of `A` and `B`
   
   
   **Additional context**
   Add any other context about the problem here.
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
[email protected]

[GitHub] [arrow-datafusion] alamb opened a new issue #250: Incorrect answers with SELECT DISTINCT queries

Reply via email to