neilconway opened a new issue, #20415:
URL: https://github.com/apache/datafusion/issues/20415

   ### Is your feature request related to a problem or challenge?
   
   We currently implement uncorrelated scalar subqueries as joins. For example, 
the query from #18181 is (after inlining a CTE):
   
   ```sql
   SELECT COUNT(*)
       FROM places p
       WHERE array_has_any(p.fsq_category_ids, (SELECT array_agg(category_id) 
AS category_ids FROM categories_raw LIMIT 500);
   ```
   
   Transforming this into a join has a few shortcomings:
   1. We currently don't enforce that the subquery must return 0 or 1 rows, 
which is required by the SQL spec.
   2. Transforming the subquery into a join incurs the overhead of evaluating 
the join itself.
   3. Perhaps most importantly, this strategy eagerly broadcasts the scalar 
value produced by the subquery into an entire column of repeated values. This 
prevents UDFs from taking fast-paths that take advantage of being called with a 
scalar input. In the case of the query above, after landing #20392, this would 
improve query performance by roughly 300x
   
   ### Describe the solution you'd like
   
   I'd like to address this by changing how uncorrelated subqueries are 
represented and evaluated. This would roughly follow how uncorrelated 
subqueries are handled in Postgres:
   
   (A) Have a distinct expr type for a scalar subquery
   (B) Plan the subquery independently
   (C) Execute it once as part of executing the main plan
   (D) To evaluate the scalar subquery expression, arrange to lookup and return 
the result of (C)
   
   ### Describe alternatives you've considered
   
   _No response_
   
   ### Additional context
   
   _No response_


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to