alamb opened a new issue #250: URL: https://github.com/apache/arrow-datafusion/issues/250
**Describe the bug** `SELECT DISTINCT` (note not distinct aggregates, eg `SELECT COUNT (DISTINCT)...`) produce incorrect results **To Reproduce** ```shell echo "A" > /tmp/foo.csv echo "B" >> /tmp/foo.csv echo "B" >> /tmp/foo.csv echo "B" >> /tmp/foo.csv ``` Then in the datafusion-cli: ``` > CREATE EXTERNAL TABLE t(col varchar) STORED AS CSV LOCATION '/tmp/foo.csv'; 0 rows in set. Query took 0 seconds. > SELECT DISTINCT col from t > ; +-----+ | col | +-----+ | A | | B | | B | | B | +-----+ ``` **Expected behavior** Expected results are no duplicates. The output should contain only 2 rows with values of `A` and `B` **Additional context** Add any other context about the problem here. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: [email protected]
