zcattacz opened a new issue, #2982: URL: https://github.com/apache/drill/issues/2982
**Describe the bug** Possible bug aspect: - Drill seems randomly rejecting this simple query, as long as I keep trying, it will get over without me doing any change to the source parquets or the query. - The query `where` clause has excluded the parquet files known to lack of the required column. Drill rejects the query on random files disregarding the `where` clause What happened: Drill is used in the scada data process pipeline, the scada data is columnar table with more and more channels (cols with numeric id and float data) added over time. each 24hour data is dumped into a separate parquet file (e.g. m250131.parquet), all files stored in the same folder `d:/datarepo/fix1/` Due to latter files have more columns, than older files, I use a initial query to filter the raw data into a separate temporary table and only work on the temporary table. ``` drop table if exists dfs.ds.metric_lines_raw; create table dfs.ds.metric_lines_raw as select index, `107`, `207`, `307`, `407`, `507`, `607`, `707`, `807`, `907`, `1007`, `1107`,`1207`,`1307`,`1407`,`1507`,`1607`,`1707`,`1807`,`1907`,`2007`, `2107`,`2207`,`2307`,`2407`, `10102`, `10202`, `10302`, `10402`, `10502`, `10602`, `10702`, `10802`, `10902`, `11002`, `11102`, `11202`, `11302`, `11402`, `11502`, `11602`, `11702`, `11802`, `11902`, `12002`, `12202`, `12302`, `12402` from (select * from dfs.datarepo.`fix1` where `filename` like 'm25%') where `filename` like 'm2501%' or `filename` like 'm2502%' or `filename` like 'm2503%' ; ``` I try to keep the query simple, but Drill randomly rejects the query with or without a subquery. **Expected behavior** For simple select query, Drill should respect the `where` clause, without probing irrelevant files. For simple query, even if parquest have different number of columns, as long as the required column exist, and valid, Drill is expected to proceed without issue. **Error detail, log output or screenshots** ``` An error occurred when executing the SQL command: create table dfs.ds.metric_lines_raw as select index, `107`, `207`, `307`, `407`, `507`, `607`, `707`, `807`, `907`, `1007`, `1107`,`1207`,`1307`,`... DATA_READ ERROR: Exception occurred while reading from disk. File: d:/datarepo/fix1/m210520.parquet Column: 11102 Row Group Start: 775564 File: d:/datarepo/fix1/m210520.parquet Column: 11102 Row Group Start: 775564 Fragment: 1:2 [Error Id: f75abdf9-dad7-48c3-b047-763d3e2a5edc on TVSSRV02:31010] 1 statement failed. Execution time: 9.07s --- retry --- .... File: d:/datarepo/fix1/m210103.parquet Column: 107 Row Group Start: 292 File: d:/datarepo/fix1/m210103.parquet Column: 107 Row Group Start: 292 Fragment: 1:1 [Error Id: b96f48a5-8eaf-46e1-ad26-760946e5aba1 on TVSSRV02:31010] 1 statement failed. Execution time: 8.67s -- retry --- ... File: d:/datarepo/fix1/m210212.parquet Column: 10502 Row Group Start: 49816 File: d:/datarepo/fix1/m210212.parquet Column: 10502 Row Group Start: 49816 Fragment: 1:1 [Error Id: a38bd17c-9ee5-4fef-8dd9-9673e95eae1b on TVSSRV02:31010] 1 statement failed. Execution time: 8.81s ... ``` **Drill version** Drill 1.21.1 LibreJDK jdk-17.0.14 Windows 2019 SQLWorkbench/J with org.apache.drill.jdbc driver **Additional context** Server with >10G ram, >80G disk, 4 core, unlikely a resource issue. Total parquet file 3.5 G Expected parquet file (m25*) in query 256MB -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: dev-unsubscr...@drill.apache.org.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org