lostmygithubaccount opened a new issue, #10830:
URL: https://github.com/apache/datafusion/issues/10830

   ### Describe the bug
   
   in running some benchmarking, was getting errors like:
   
   ```
   Exception: This feature is not implemented: Physical plan does not support 
logical expression InSubquery(InSubquery { expr: Column(Column { relation: 
None, name: "ps_suppkey" }), subquery: <subquery>, negated: false })
   ```
   
   tracking this down shows `x NOT IN y` works but `NOT (x IN y) doesn't`
   
   ### To Reproduce
   
   probably a more succinct way, but:
   
   ```python
   import datafusion
   
   ctx = datafusion.SessionContext()
   
   tables = {
       "part": {
           "p_partkey": [1, 2, 3, 4, 5],
           "p_brand": ["Brand#45", "Brand#45", "Brand#46", "Brand#47", 
"Brand#48"],
           "p_type": ["MEDIUM POLISHED", "MEDIUM POLISHED", "MEDIUM POLISHED", 
"MEDIUM POLISHED", "MEDIUM POLISHED"],
           "p_size": [49, 14, 23, 45, 19],
       },
       "partsupp": {
           "ps_partkey": [1, 2, 3, 4, 5],
           "ps_suppkey": [1, 2, 3, 4, 5],
       },
       "supplier": {
           "s_suppkey": [1, 2, 3, 4, 5],
           "s_comment": ["Customer Complaints", "Customer Complaints", 
"Customer Complaints", "Customer Complaints", "Customer Complaints"],
       },
   }
   
   for table_name, data in tables.items():
       ctx.from_pydict(data, name=table_name)
   
   sqlA = """
   select
       p_brand,
       p_type,
       p_size,
       count(distinct ps_suppkey) as supplier_cnt
   from
       partsupp,
       part
   where
           p_partkey = ps_partkey
     and p_brand <> 'Brand#45'
     and p_type not like 'MEDIUM POLISHED%'
     and p_size in (49, 14, 23, 45, 19, 3, 36, 9)
     and ps_suppkey not in (
       select
           s_suppkey
       from
           supplier
       where
               s_comment like '%Customer%Complaints%'
   )
   group by
       p_brand,
       p_type,
       p_size
   order by
       supplier_cnt desc,
       p_brand,
       p_type,
       p_size;
   """
   
   ctx.sql(sqlA).collect() # this is fine
   
   sqlB = """
   SELECT
     p_brand,
     p_type,
     p_size,
     COUNT(DISTINCT ps_suppkey) AS supplier_cnt
   FROM partsupp, part
   WHERE
     p_partkey = ps_partkey
     AND p_brand <> 'Brand#45'
     AND NOT p_type LIKE 'MEDIUM POLISHED%'
     AND p_size IN (49, 14, 23, 45, 19, 3, 36, 9)
     AND NOT ps_suppkey IN (
       SELECT
         s_suppkey
       FROM supplier
       WHERE
         s_comment LIKE '%Customer%Complaints%'
     )
   GROUP BY
     p_brand,
     p_type,
     p_size
   ORDER BY
     supplier_cnt DESC NULLS LAST,
     p_brand,
     p_type,
     p_size;
   """
   
   ctx.sql(sqlB).collect() # this is not fine
   
   # Exception: This feature is not implemented: Physical plan does not support 
logical expression InSubquery(InSubquery { expr: Column(Column { relation: 
Some(Bare { table: "partsupp" }), name: "ps_suppkey" }), subquery: <subquery>, 
negated: false })
   ```
   
   
   ```
   
   ### Expected behavior
   
   query works in both cases
   
   ### Additional context
   
   SQLGlot makes the above transformation that should be valid, which is how I 
hit this initially 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org
For additional commands, e-mail: github-h...@datafusion.apache.org

Reply via email to