matteohexagon opened a new issue, #3014:
URL: https://github.com/apache/drill/issues/3014

   ### **Subject: Query Planner Fails to Validate Valid ABFSS Path with 
Wildcard (`**`)**
   
   **Component:** Storage - Azure
   
   **Apache Drill Version:** `1.22.0`
   
   **Summary:**
   
   A `SELECT` query against a specific directory path on Azure Blob Storage 
(using the ABFSS connector) fails during the validation phase with an "Object 
not found" error. However, Drill's own file listing tools (`SHOW FILES`) can 
see and list the contents of the exact same path, and a global wildcard query 
can read the data successfully.
   
   The issue appears to be a bug in the query planner's path validation logic. 
The planner seems to develop a "stuck" or "corrupted" state for certain 
directory names, refusing to acknowledge them in `SELECT` statements while 
other parts of Drill can access them without issue. The bug persists even after 
restarting the Drillbit and completely deleting/recreating the storage plugin.
   
   **Environment:**
   
   *   **Storage Plugin:** `file`
   *   **Connection Type:** Azure Blob Storage 
(`abfss://<container>@<account>.dfs.core.windows.net`)
   *   **Authentication:** `SharedKey`
   
   **Storage Plugin Configuration:**
   
   ```json
   {
     "type": "file",
     "enabled": true,
     "connection": "abfss://<container>@<account>.dfs.core.windows.net",
     "config": {
       "fs.azure.account.auth.type": "SharedKey",
       "fs.azure.account.key.observercondenseddata.dfs.core.windows.net": "...",
       "fs.azure.createRemoteFileSystemDuringInitialization": "false",
       "fs.azure.io.list.recursive": "true"
     },
     "workspaces": {
       "root": {
         "location": "/",
         "writable": false,
         "allowRecursiveScan": true
       },
       "monthly": {
          "location": "/prod-condenser-logs-1-Month/",
          "writable": false,
          "allowRecursiveScan": true
        },
        "daily": {
          "location": "/prod-condenser-logs-1-day/",
          "writable": false,
          "allowRecursiveScan": true
        },
        "hourly": {
          "location": "/prod-condenser-logs-1-hour/",
          "writable": false,
          "allowRecursiveScan": true
        }
     },
     "formats": {
       "log": {
         "type": "logRegex",
         "extension": "log",
         "regex": "^(\\d{4}-\\d{2}-\\d{2} \\d{2}:\\d{2}:\\d{2},\\d{3}) - (\\w+) 
- (.*)|^(.+)",
         "maxErrors": 100000,
         "schema": [
           {"fieldName": "log_timestamp", "fieldType": "TIMESTAMP", "format": 
"yyyy-MM-dd HH:mm:ss,SSS"},
           {"fieldName": "log_level"},
           {"fieldName": "structured_message"},
           {"fieldName": "unstructured_line"}
         ]
       }
     }
   }
   ```
   
   **Directory Structure on Azure:**
   
   ```
   /
   ├── prod-condenser-logs-1-Month/
   │   └── 2025/
   │       └── 07/
   ├── prod-condenser-logs-1-day/
   │   └── 2025/
   │       ├── 07/
   │       └── 08/
   └── prod-condenser-logs-1-hour/
       └── 2025/
           └── ...
   ```
   
   **Steps to Reproduce:**
   
   1.  **A query on a sibling directory works correctly:** The following query 
against the `...-1-Month` directory executes successfully every time.
   
       ```sql
       SELECT * FROM az.root.`prod-condenser-logs-1-Month/2025/**` LIMIT 10;
       ```
   
   2.  **An identical query on the target directory fails:** The following 
query against the `...-1-day` directory consistently fails.
   
       ```sql
       SELECT * FROM az.root.`prod-condenser-logs-1-day/2025/**` LIMIT 10;
       ```
   
   3.  **Drill's listing tools prove the path is visible:** Contradicting the 
query failure, the `SHOW FILES` command can see and list the contents of the 
failing directory, proving the path is valid and accessible to Drill.
   
       ```sql
       -- This command SUCCEEDS and shows the '2025' directory within
       SHOW FILES FROM az.root.`prod-condenser-logs-1-day`;
       ```
   
   **Expected Behavior:**
   
   The `SELECT` query against `az.root.`prod-condenser-logs-1-day/2025/**`` 
should execute successfully, just as the query against the sibling 
`...-1-Month` directory does.
   
   **Actual Behavior:**
   
   The query fails during the validation phase with the error:
   `VALIDATION ERROR: ... Object 'prod-condenser-logs-1-day/2025/**' not found 
within 'az.root'`
   
   **Troubleshooting Steps Attempted (All Failed to Resolve the Issue):**
   
   *   **Restarting the Drillbit:** The issue persists immediately after a full 
restart.
   *   **Deleting and Recreating the Storage Plugin:** The exact same behavior 
occurs after completely removing the `az` plugin and recreating it from the 
saved configuration.
   *   **Renaming/Duplicating the Source Directory:** Renaming the directory in 
Azure to a new name (e.g., `prod-condenser-logs-daily-new`) and querying it 
results in the same "Object not found" error.
   *   **Using Defined Workspaces:** Querying via the `az.daily` workspace 
(e.g., `FROM az.daily.`2025/**``) also fails with the same error, even though 
`SHOW FILES IN az.daily` correctly lists the contents.
   *   **`REFRESH TABLE METADATA`:** This command fails because Drill does not 
recognize the paths as tables.
   
   **Final Workaround Discovered:**
   
   The only reliable method to query the data in the affected directories is to 
use a global wildcard from the root (`FROM az.root.`**``) and then filter the 
desired path using a `WHERE` clause. This proves the data is readable and the 
bug is specific to the planner's path validation.
   
   ```sql
   -- This query WORKS and returns data from the '...-1-day' directory
   SELECT *
   FROM az.root.`**`
   WHERE filepath LIKE '%/prod-condenser-logs-1-day/%'
   LIMIT 10;
   ```
   
   This workaround suggests the core data reading engine is functional, but the 
upfront query validation is failing on specific path strings.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: dev-unsubscr...@drill.apache.org.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org

Reply via email to