matteohexagon opened a new issue, #3014: URL: https://github.com/apache/drill/issues/3014
### **Subject: Query Planner Fails to Validate Valid ABFSS Path with Wildcard (`**`)** **Component:** Storage - Azure **Apache Drill Version:** `1.22.0` **Summary:** A `SELECT` query against a specific directory path on Azure Blob Storage (using the ABFSS connector) fails during the validation phase with an "Object not found" error. However, Drill's own file listing tools (`SHOW FILES`) can see and list the contents of the exact same path, and a global wildcard query can read the data successfully. The issue appears to be a bug in the query planner's path validation logic. The planner seems to develop a "stuck" or "corrupted" state for certain directory names, refusing to acknowledge them in `SELECT` statements while other parts of Drill can access them without issue. The bug persists even after restarting the Drillbit and completely deleting/recreating the storage plugin. **Environment:** * **Storage Plugin:** `file` * **Connection Type:** Azure Blob Storage (`abfss://<container>@<account>.dfs.core.windows.net`) * **Authentication:** `SharedKey` **Storage Plugin Configuration:** ```json { "type": "file", "enabled": true, "connection": "abfss://<container>@<account>.dfs.core.windows.net", "config": { "fs.azure.account.auth.type": "SharedKey", "fs.azure.account.key.observercondenseddata.dfs.core.windows.net": "...", "fs.azure.createRemoteFileSystemDuringInitialization": "false", "fs.azure.io.list.recursive": "true" }, "workspaces": { "root": { "location": "/", "writable": false, "allowRecursiveScan": true }, "monthly": { "location": "/prod-condenser-logs-1-Month/", "writable": false, "allowRecursiveScan": true }, "daily": { "location": "/prod-condenser-logs-1-day/", "writable": false, "allowRecursiveScan": true }, "hourly": { "location": "/prod-condenser-logs-1-hour/", "writable": false, "allowRecursiveScan": true } }, "formats": { "log": { "type": "logRegex", "extension": "log", "regex": "^(\\d{4}-\\d{2}-\\d{2} \\d{2}:\\d{2}:\\d{2},\\d{3}) - (\\w+) - (.*)|^(.+)", "maxErrors": 100000, "schema": [ {"fieldName": "log_timestamp", "fieldType": "TIMESTAMP", "format": "yyyy-MM-dd HH:mm:ss,SSS"}, {"fieldName": "log_level"}, {"fieldName": "structured_message"}, {"fieldName": "unstructured_line"} ] } } } ``` **Directory Structure on Azure:** ``` / ├── prod-condenser-logs-1-Month/ │ └── 2025/ │ └── 07/ ├── prod-condenser-logs-1-day/ │ └── 2025/ │ ├── 07/ │ └── 08/ └── prod-condenser-logs-1-hour/ └── 2025/ └── ... ``` **Steps to Reproduce:** 1. **A query on a sibling directory works correctly:** The following query against the `...-1-Month` directory executes successfully every time. ```sql SELECT * FROM az.root.`prod-condenser-logs-1-Month/2025/**` LIMIT 10; ``` 2. **An identical query on the target directory fails:** The following query against the `...-1-day` directory consistently fails. ```sql SELECT * FROM az.root.`prod-condenser-logs-1-day/2025/**` LIMIT 10; ``` 3. **Drill's listing tools prove the path is visible:** Contradicting the query failure, the `SHOW FILES` command can see and list the contents of the failing directory, proving the path is valid and accessible to Drill. ```sql -- This command SUCCEEDS and shows the '2025' directory within SHOW FILES FROM az.root.`prod-condenser-logs-1-day`; ``` **Expected Behavior:** The `SELECT` query against `az.root.`prod-condenser-logs-1-day/2025/**`` should execute successfully, just as the query against the sibling `...-1-Month` directory does. **Actual Behavior:** The query fails during the validation phase with the error: `VALIDATION ERROR: ... Object 'prod-condenser-logs-1-day/2025/**' not found within 'az.root'` **Troubleshooting Steps Attempted (All Failed to Resolve the Issue):** * **Restarting the Drillbit:** The issue persists immediately after a full restart. * **Deleting and Recreating the Storage Plugin:** The exact same behavior occurs after completely removing the `az` plugin and recreating it from the saved configuration. * **Renaming/Duplicating the Source Directory:** Renaming the directory in Azure to a new name (e.g., `prod-condenser-logs-daily-new`) and querying it results in the same "Object not found" error. * **Using Defined Workspaces:** Querying via the `az.daily` workspace (e.g., `FROM az.daily.`2025/**``) also fails with the same error, even though `SHOW FILES IN az.daily` correctly lists the contents. * **`REFRESH TABLE METADATA`:** This command fails because Drill does not recognize the paths as tables. **Final Workaround Discovered:** The only reliable method to query the data in the affected directories is to use a global wildcard from the root (`FROM az.root.`**``) and then filter the desired path using a `WHERE` clause. This proves the data is readable and the bug is specific to the planner's path validation. ```sql -- This query WORKS and returns data from the '...-1-day' directory SELECT * FROM az.root.`**` WHERE filepath LIKE '%/prod-condenser-logs-1-day/%' LIMIT 10; ``` This workaround suggests the core data reading engine is functional, but the upfront query validation is failing on specific path strings. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: dev-unsubscr...@drill.apache.org.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org