Re: [DISCUSS] Limiting query results by size (CASSANDRA-11745)

Benjamin Lerer Mon, 12 Jun 2023 02:16:27 -0700

Thanks Jacek for raising that discussion.

I do not have in mind a scenario where it could be useful to specify a
LIMIT in bytes. The LIMIT clause is usually used when you know how many
rows you wish to display or use. Unless somebody has a useful scenario in
mind I do not think that there is a need for that feature.


Paging in bytes makes sense to me as the paging mechanism is transparent
for the user in most drivers. It is simply a way to optimize your memory
usage from end to end.

I do not like the approach of using both of them simultaneously because if
you request a page with a certain amount of rows and do not get it then is
is really confusing and can be a problem for some usecases. We have users
keeping their session open and the page information to display page of data.

Le lun. 12 juin 2023 à 09:08, Jacek Lewandowski <lewandowski.ja...@gmail.com>
a écrit :

> Hi,
>
> I was working on limiting query results by their size expressed in bytes,
> and some questions arose that I'd like to bring to the mailing list.
>
> The semantics of queries (without aggregation) - data limits are applied
> on the raw data returned from replicas - while it works fine for the row
> number limits as the number of rows is not likely to change after
> post-processing, it is not that accurate for size based limits as the cell
> sizes may be different after post-processing (for example due to applying
> some transformation function, projection, or whatever).
>
> We can truncate the results after post-processing to stay within the
> user-provided limit in bytes, but if the result is smaller than the limit -
> we will not fetch more. In that case, the meaning of "limit" being an
> actual limit is valid though it would be misleading for the page size
> because we will not fetch the maximum amount of data that does not exceed
> the page size.
>
> Such a problem is much more visible for "group by" queries with
> aggregation. The paging and limiting mechanism is applied to the rows
> rather than groups, as it has no information about how much memory a single
> group uses. For now, I've approximated a group size as the size of the
> largest participating row.
>
> The problem concerns the allowed interpretation of the size limit
> expressed in bytes. Whether we want to use this mechanism to let the users
> precisely control the size of the resultset, or we instead want to use this
> mechanism to limit the amount of memory used internally for the data and
> prevent problems (assuming restricting size and rows number can be used
> simultaneously in a way that we stop when we reach any of the specified
> limits).
>
> https://issues.apache.org/jira/browse/CASSANDRA-11745
>
> thanks,
> - - -- --- ----- -------- -------------
> Jacek Lewandowski
>

Re: [DISCUSS] Limiting query results by size (CASSANDRA-11745)

Reply via email to