Thanks Jacek for raising that discussion. I do not have in mind a scenario where it could be useful to specify a LIMIT in bytes. The LIMIT clause is usually used when you know how many rows you wish to display or use. Unless somebody has a useful scenario in mind I do not think that there is a need for that feature.
Paging in bytes makes sense to me as the paging mechanism is transparent for the user in most drivers. It is simply a way to optimize your memory usage from end to end. I do not like the approach of using both of them simultaneously because if you request a page with a certain amount of rows and do not get it then is is really confusing and can be a problem for some usecases. We have users keeping their session open and the page information to display page of data. Le lun. 12 juin 2023 à 09:08, Jacek Lewandowski <lewandowski.ja...@gmail.com> a écrit : > Hi, > > I was working on limiting query results by their size expressed in bytes, > and some questions arose that I'd like to bring to the mailing list. > > The semantics of queries (without aggregation) - data limits are applied > on the raw data returned from replicas - while it works fine for the row > number limits as the number of rows is not likely to change after > post-processing, it is not that accurate for size based limits as the cell > sizes may be different after post-processing (for example due to applying > some transformation function, projection, or whatever). > > We can truncate the results after post-processing to stay within the > user-provided limit in bytes, but if the result is smaller than the limit - > we will not fetch more. In that case, the meaning of "limit" being an > actual limit is valid though it would be misleading for the page size > because we will not fetch the maximum amount of data that does not exceed > the page size. > > Such a problem is much more visible for "group by" queries with > aggregation. The paging and limiting mechanism is applied to the rows > rather than groups, as it has no information about how much memory a single > group uses. For now, I've approximated a group size as the size of the > largest participating row. > > The problem concerns the allowed interpretation of the size limit > expressed in bytes. Whether we want to use this mechanism to let the users > precisely control the size of the resultset, or we instead want to use this > mechanism to limit the amount of memory used internally for the data and > prevent problems (assuming restricting size and rows number can be used > simultaneously in a way that we stop when we reach any of the specified > limits). > > https://issues.apache.org/jira/browse/CASSANDRA-11745 > > thanks, > - - -- --- ----- -------- ------------- > Jacek Lewandowski >