egasimov opened a new issue, #2941:
URL: https://github.com/apache/drill/issues/2941

   Hello Drill community,
   Recently, we have encountered with the following issue(` 
UNSUPPORTED_OPERATION ERROR: HashJoinBatch does not support schema changes in 
build side.
   Prior schema`), when joining two CTEs.
   
   There two CTEs which having filter conditions from list of items. and need 
to join based on given condition.
   it seems, after join operation, schema changes for field `filtered__items1`
   [`filtered__items1` (NULL:OPTIONAL)]]
   [`filtered__items1` (MAP:REPEATED)
   
   **Executed queries:**
   ```
   WITH cte_1 AS (
     SELECT t.customer_id as customer_id1, t.customer_name as customer_name1,
            COLLECT_LIST('val', t.p_item) filtered__items1
     FROM (
       SELECT d.customer_id, d.customer_name, flatten(d.purchased_items) AS 
p_item 
       FROM dfs.root.`/datas3/customers/*` d
     ) t
     WHERE t.p_item['product_id'] IN (777, 888)
     GROUP BY t.customer_id, t.customer_name
   ),
   
   cte_2 AS (
     SELECT t.customer_id as customer_id2, t.customer_name as customer_name2,
            COLLECT_LIST('val', t.p_item) filtered__items2
     FROM (
       SELECT d.customer_id, d.customer_name, flatten(d.purchased_items) AS 
p_item 
       FROM dfs.root.`/datas3/customers/*` d
     ) t
     WHERE t.p_item['product_id'] IN (999)
     GROUP BY t.customer_id, t.customer_name
   )
   
   SELECT cte_1.*
   FROM cte_1
   JOIN cte_2 ON cte_1.customer_id1 = cte_2.customer_id2;
   
   ```
   
   **Error details:**
   ```
   
-------------------------------------------------------------------------------------------------------
   org.apache.drill.common.exceptions.UserRemoteException: 
UNSUPPORTED_OPERATION ERROR: HashJoinBatch does not support schema changes in 
build side.
   Prior schema : 
   BatchSchema [fields=[[`customer_id1` (BIGINT:OPTIONAL)], [`customer_name1` 
(VARCHAR:OPTIONAL)], [`filtered__items1` (NULL:OPTIONAL)]], 
selectionVector=NONE]
   New schema : 
   BatchSchema [fields=[[`customer_id1` (BIGINT:OPTIONAL)], [`customer_name1` 
(VARCHAR:OPTIONAL)], [`filtered__items1` (MAP:REPEATED), children=([`val` 
(MAP:REQUIRED), children=([`item_id` (BIGINT:OPTIONAL)], [`item_class` 
(VARCHAR:OPTIONAL)], [`product_id` (BIGINT:OPTIONAL)], [`created_at` 
(VARCHAR:OPTIONAL)])])]], selectionVector=NONE]
   
   Unsupported schema change
   Fragment: 0:0
   
   [Error Id: 6b857b70-583e-4fb0-ab02-d1e6b536c569 on 8a288c59e4a7:31010]
   ------------------------------------------------------------------------
   
   ```
     
   **Drill version**
   Apache drill version: 1.21.2
   
   **Additional context**
   Test data
   
   ```
   [
       {
       "customer_id": 1000001,
       "customer_name": "John Doe",
       "purchased_items": [
           {
               "item_id": 2000001,
               "item_class": "A",
               "product_id": 777,
               "created_at": "2024-06-12T11:36:37.751Z"
           },
           {
               "item_id": 2000002,
               "item_class": "B",
               "product_id": 888,
               "created_at": "2024-06-12T08:46:37.751Z"
           },
           {
               "item_id": 2000003,  # Corrected item_id for uniqueness
               "item_class": "C",
               "product_id": 999,
               "created_at": "2024-06-12T11:56:00.751Z"
           }
       ]
       },
       {
       "customer_id": 1000002,
       "customer_name": "Black Smith",
       "purchased_items": [
           {
               "item_id": 2000004,
               "item_class": "A",
               "product_id": 777,
               "created_at": "2024-06-12T11:36:33.751Z"
           },
           {
               "item_id": 2000006,  # Corrected item_id for uniqueness
               "item_class": "C",
               "product_id": 999,
               "created_at": "2024-08-12T11:56:37.751Z"
           }
       ]
       },
           {
       "customer_id": 1000003,
       "customer_name": "Alice Doe",
       "purchased_items": [
           {
               "item_id": 2000010,
               "item_class": "A",
               "product_id": 777,
               "created_at": "2024-01-12T11:36:37.751Z"
           },
           {
               "item_id": 2000011,  # Corrected item_id for uniqueness
               "item_class": "C",
               "product_id": 888,
               "created_at": "2024-04-12T11:56:37.751Z"
           }
       ]
       }
   ]
   
   ```


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: dev-unsubscr...@drill.apache.org.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org

Reply via email to