Rachelint commented on PR #15851:
URL: https://github.com/apache/datafusion/pull/15851#issuecomment-2832489248

   > > But I want to improve it in an new pr, for not blocking #15591 , is it 
ok?
   > 
   > Sure.
   > 
   > > We should ensure all ordering situations can be covered.
   > > At least 1 full ordering + 1 partial ordering + 1 no ordering .
   > > Like dataset sorted by a,b, can ensure at least following three cases:
   > 
   >     1. We need to generate random queries covered all the cases easily AND
   > 
   >     2. We need to generate random specific query easily too.
   > 
   > 
   > I think the current implementation issue is that the order is defined once 
in the dataset level with **column name**, so all the generated query has the 
ordering information based on the dataset.
   > 
   > Dataset1: a, b ordered, c not ordered. Any generated query has the same 
ordering as dataset defined
   > 
   > ```
   > SELECT xxx from xxx GROUP BY a,b (a, b ordered)
   > SELECT xxx from xxx GROUP BY a,c (a ordered, c ordered)
   > SELECT xxx from xxx GROUP BY b (b ordered)
   > ```
   > 
   > But I want ordering with specific index for each query, assume I want 
first column ordered and second column unorderd.
   > 
   > Dataset2: a,b,c,d,e ....
   > 
   > These are random generated query with fixed columns (2).
   > 
   > ```
   > SELECT xxx from xxx GROUP BY a,b (a ordered, b unorderd)
   > SELECT xxx from xxx GROUP BY a,c (a ordered, c unorderd)
   > SELECT xxx from xxx GROUP BY c,d (c ordered, d unorderd)
   > SELECT xxx from xxx GROUP BY d,e (d ordered, e unorderd)
   > ```
   > 
   > I expect is to be **index-specific** ordering not based on the **column 
name**
   > 
   > With `index-specific`, we can expect the generated query is always full 
order or partial order. In the current implementation, we can't guarantee this.
   
   Thanks @jayzhan211 
   Make sense, it is indeed better `sql define dataset sorting` rather than 
`pre-sorted dataset define sql`.
   I will try to make it in follow up prs.
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org
For additional commands, e-mail: github-h...@datafusion.apache.org

Reply via email to