Rachelint commented on PR #15851: URL: https://github.com/apache/datafusion/pull/15851#issuecomment-2832489248
> > But I want to improve it in an new pr, for not blocking #15591 , is it ok? > > Sure. > > > We should ensure all ordering situations can be covered. > > At least 1 full ordering + 1 partial ordering + 1 no ordering . > > Like dataset sorted by a,b, can ensure at least following three cases: > > 1. We need to generate random queries covered all the cases easily AND > > 2. We need to generate random specific query easily too. > > > I think the current implementation issue is that the order is defined once in the dataset level with **column name**, so all the generated query has the ordering information based on the dataset. > > Dataset1: a, b ordered, c not ordered. Any generated query has the same ordering as dataset defined > > ``` > SELECT xxx from xxx GROUP BY a,b (a, b ordered) > SELECT xxx from xxx GROUP BY a,c (a ordered, c ordered) > SELECT xxx from xxx GROUP BY b (b ordered) > ``` > > But I want ordering with specific index for each query, assume I want first column ordered and second column unorderd. > > Dataset2: a,b,c,d,e .... > > These are random generated query with fixed columns (2). > > ``` > SELECT xxx from xxx GROUP BY a,b (a ordered, b unorderd) > SELECT xxx from xxx GROUP BY a,c (a ordered, c unorderd) > SELECT xxx from xxx GROUP BY c,d (c ordered, d unorderd) > SELECT xxx from xxx GROUP BY d,e (d ordered, e unorderd) > ``` > > I expect is to be **index-specific** ordering not based on the **column name** > > With `index-specific`, we can expect the generated query is always full order or partial order. In the current implementation, we can't guarantee this. Thanks @jayzhan211 Make sense, it is indeed better `sql define dataset sorting` rather than `pre-sorted dataset define sql`. I will try to make it in follow up prs. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org --------------------------------------------------------------------- To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org For additional commands, e-mail: github-h...@datafusion.apache.org