Yes, LIMIT BY <bytes> provided by the user in CQL does not make much sense to me either
pon., 12 cze 2023 o 11:20 Benedict <bened...@apache.org> napisał(a): > I agree that this is more suitable as a paging option, and not as a CQL > LIMIT option. > > If it were to be a CQL LIMIT option though, then it should be accurate > regarding result set IMO; there shouldn’t be any further results that could > have been returned within the LIMIT. > > On 12 Jun 2023, at 10:16, Benjamin Lerer <ble...@apache.org> wrote: > > > Thanks Jacek for raising that discussion. > > I do not have in mind a scenario where it could be useful to specify a > LIMIT in bytes. The LIMIT clause is usually used when you know how many > rows you wish to display or use. Unless somebody has a useful scenario in > mind I do not think that there is a need for that feature. > > Paging in bytes makes sense to me as the paging mechanism is transparent > for the user in most drivers. It is simply a way to optimize your memory > usage from end to end. > > I do not like the approach of using both of them simultaneously because if > you request a page with a certain amount of rows and do not get it then is > is really confusing and can be a problem for some usecases. We have users > keeping their session open and the page information to display page of data. > > Le lun. 12 juin 2023 à 09:08, Jacek Lewandowski < > lewandowski.ja...@gmail.com> a écrit : > >> Hi, >> >> I was working on limiting query results by their size expressed in bytes, >> and some questions arose that I'd like to bring to the mailing list. >> >> The semantics of queries (without aggregation) - data limits are applied >> on the raw data returned from replicas - while it works fine for the row >> number limits as the number of rows is not likely to change after >> post-processing, it is not that accurate for size based limits as the cell >> sizes may be different after post-processing (for example due to applying >> some transformation function, projection, or whatever). >> >> We can truncate the results after post-processing to stay within the >> user-provided limit in bytes, but if the result is smaller than the limit - >> we will not fetch more. In that case, the meaning of "limit" being an >> actual limit is valid though it would be misleading for the page size >> because we will not fetch the maximum amount of data that does not exceed >> the page size. >> >> Such a problem is much more visible for "group by" queries with >> aggregation. The paging and limiting mechanism is applied to the rows >> rather than groups, as it has no information about how much memory a single >> group uses. For now, I've approximated a group size as the size of the >> largest participating row. >> >> The problem concerns the allowed interpretation of the size limit >> expressed in bytes. Whether we want to use this mechanism to let the users >> precisely control the size of the resultset, or we instead want to use this >> mechanism to limit the amount of memory used internally for the data and >> prevent problems (assuming restricting size and rows number can be used >> simultaneously in a way that we stop when we reach any of the specified >> limits). >> >> https://issues.apache.org/jira/browse/CASSANDRA-11745 >> >> thanks, >> - - -- --- ----- -------- ------------- >> Jacek Lewandowski >> >