alamb opened a new issue #250:
URL: https://github.com/apache/arrow-datafusion/issues/250


   **Describe the bug**
   `SELECT DISTINCT` (note not distinct aggregates, eg `SELECT COUNT 
(DISTINCT)...`) produce incorrect results
   
   **To Reproduce**
   
   ```shell
   echo "A" > /tmp/foo.csv
   echo "B" >> /tmp/foo.csv
   echo "B" >> /tmp/foo.csv
   echo "B" >> /tmp/foo.csv
   ```
   
   Then in the datafusion-cli:
   
   ```
   > CREATE EXTERNAL TABLE t(col varchar)
   STORED AS CSV
   LOCATION '/tmp/foo.csv';
   0 rows in set. Query took 0 seconds.
   > SELECT DISTINCT col from t
   > ;
   +-----+
   | col |
   +-----+
   | A   |
   | B   |
   | B   |
   | B   |
   +-----+
   ```
   
   **Expected behavior**
   Expected results are no duplicates. The output should contain only 2 rows 
with values of `A` and `B`
   
   
   **Additional context**
   Add any other context about the problem here.
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
[email protected]


Reply via email to