[ https://issues.apache.org/jira/browse/DRILL-4898?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Khurram Faraaz updated DRILL-4898: ---------------------------------- Description: incorrect results : Query on directory containing CSV data directory has 4534327 number of rows ~ 4.5M records (there are 6 CSV files) Drill 1.9.0 commit ID: f3c26e34 I can share the data to reproduce the issue. Note that data in columns[3] has the value "B02512\r" in query results. {noformat} 0: jdbc:drill:schema=dfs.tmp> select * from `uber_trip_data` limit 5; +----------------------------------------------------------+ | columns | +----------------------------------------------------------+ | ["2014-08-01 00:03:00","40.7366","-73.9906","B02512\r"] | | ["2014-08-01 00:09:00","40.726","-73.9918","B02512\r"] | | ["2014-08-01 00:12:00","40.7209","-74.0507","B02512\r"] | | ["2014-08-01 00:12:00","40.7387","-73.9856","B02512\r"] | | ["2014-08-01 00:12:00","40.7323","-74.0077","B02512\r"] | +----------------------------------------------------------+ 5 rows selected (0.184 seconds) {noformat} But when we do a select on columns[3] we see a different value in the query result. {noformat} 0: jdbc:drill:schema=dfs.tmp> select columns[3] from `uber_trip_data` limit 5; +----------+ | EXPR$0 | +----------+ |02512 |02512 |02512 |02512 |02512 +----------+ 5 rows selected (0.159 seconds) {noformat} Searching for 'B02512' returns no rows. (where as it should have returned data) {noformat} 0: jdbc:drill:schema=dfs.tmp> select * from `uber_trip_data` where columns[3]='B02512'; +----------+ | columns | +----------+ +----------+ No rows selected (1.707 seconds) {noformat} was: incorrect results : Query on directory containing CSV data directory has 4534327 number of rows ~ 4.5M records (there are 6 CSV files) Drill 1.9.0 commit ID: f3c26e34 Data is available here - /home/MAPRTECH/qa/drill/uber_trip_data Note that data in columns[3] has the value "B02512\r" in query results. {noformat} 0: jdbc:drill:schema=dfs.tmp> select * from `uber_trip_data` limit 5; +----------------------------------------------------------+ | columns | +----------------------------------------------------------+ | ["2014-08-01 00:03:00","40.7366","-73.9906","B02512\r"] | | ["2014-08-01 00:09:00","40.726","-73.9918","B02512\r"] | | ["2014-08-01 00:12:00","40.7209","-74.0507","B02512\r"] | | ["2014-08-01 00:12:00","40.7387","-73.9856","B02512\r"] | | ["2014-08-01 00:12:00","40.7323","-74.0077","B02512\r"] | +----------------------------------------------------------+ 5 rows selected (0.184 seconds) {noformat} But when we do a select on columns[3] we see a different value in the query result. {noformat} 0: jdbc:drill:schema=dfs.tmp> select columns[3] from `uber_trip_data` limit 5; +----------+ | EXPR$0 | +----------+ |02512 |02512 |02512 |02512 |02512 +----------+ 5 rows selected (0.159 seconds) {noformat} Searching for 'B02512' returns no rows. (where as it should have returned data) {noformat} 0: jdbc:drill:schema=dfs.tmp> select * from `uber_trip_data` where columns[3]='B02512'; +----------+ | columns | +----------+ +----------+ No rows selected (1.707 seconds) {noformat} > wrong results : Query on directory containing CSV data > ------------------------------------------------------ > > Key: DRILL-4898 > URL: https://issues.apache.org/jira/browse/DRILL-4898 > Project: Apache Drill > Issue Type: Bug > Components: Execution - Flow > Affects Versions: 1.9.0 > Environment: 4 node cluster > Reporter: Khurram Faraaz > > incorrect results : Query on directory containing CSV data > directory has 4534327 number of rows ~ 4.5M records (there are 6 CSV files) > Drill 1.9.0 commit ID: f3c26e34 > I can share the data to reproduce the issue. > Note that data in columns[3] has the value "B02512\r" in query results. > {noformat} > 0: jdbc:drill:schema=dfs.tmp> select * from `uber_trip_data` limit 5; > +----------------------------------------------------------+ > | columns | > +----------------------------------------------------------+ > | ["2014-08-01 00:03:00","40.7366","-73.9906","B02512\r"] | > | ["2014-08-01 00:09:00","40.726","-73.9918","B02512\r"] | > | ["2014-08-01 00:12:00","40.7209","-74.0507","B02512\r"] | > | ["2014-08-01 00:12:00","40.7387","-73.9856","B02512\r"] | > | ["2014-08-01 00:12:00","40.7323","-74.0077","B02512\r"] | > +----------------------------------------------------------+ > 5 rows selected (0.184 seconds) > {noformat} > But when we do a select on columns[3] we see a different value in the query > result. > {noformat} > 0: jdbc:drill:schema=dfs.tmp> select columns[3] from `uber_trip_data` limit 5; > +----------+ > | EXPR$0 | > +----------+ > |02512 > |02512 > |02512 > |02512 > |02512 > +----------+ > 5 rows selected (0.159 seconds) > {noformat} > Searching for 'B02512' returns no rows. (where as it should have returned > data) > {noformat} > 0: jdbc:drill:schema=dfs.tmp> select * from `uber_trip_data` where > columns[3]='B02512'; > +----------+ > | columns | > +----------+ > +----------+ > No rows selected (1.707 seconds) > {noformat} -- This message was sent by Atlassian JIRA (v6.3.4#6332)