[
https://issues.apache.org/jira/browse/CASSANDRA-6492?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14629963#comment-14629963
]
Tyler Hobbs edited comment on CASSANDRA-6492 at 7/16/15 4:27 PM:
-----------------------------------------------------------------
bq. I don't understand, how is that different from your "Perhaps a good first
step is to add support for automatic page size selection"? What did you had in
mind for that? Because the only idea I had to do that from the internal metrics
would be to use the metrics to get a estimated average row size, pick some
presumably hard-coded bytes size target for a page, and compute the actual page
size in rows from that.
Sorry, I should have been more clear. That _is_ what I envisioned for
automatic page size selection. It's not optimal there (due to highly variable
row sizes), but it's basically the server making a best effort attempt, and we
haven't really made any sort of contract with the user. However, I don't think
it's as good of an idea if we expose that as a "page size in bytes" option to
the user. If the user requests a page size of 1MB but we end up reading 50MB
due to abnormally large rows, that seems like bad behavior. Maybe if we
present it as only a "very soft target" for now, that's okay, but I'm mostly
worried about not matching user expectations. With server-selected page sizes,
there are no user expectations (aside from not OOMing), so it doesn't matter as
much if we're off from our target.
bq. Or to put it another way, having the server pick a default is not the
problem we're trying to fix. The problem we're trying to fix is that to pick a
proper page size, you currently have to guess-estimate the average size of your
rows, but we can do a better guess-estimation server side and that's what we
should provide here.
I think we're trying to solve both. For aggregates, users may not even be
aware that the page size is affecting how the aggregate is handled internally,
and that's especially problematic for cqlsh, where the default page size is 100.
bq. I think we're in agreement that the no-guess-estimate solution is a lot
more involved.
Yes.
bq. And one of the bonus of directly modifying the protocol to allow a page
size target in bytes (rather than only providing a default mode with hard-coded
target server side) is that once we do implement the more involved
change-the-internals solution, we'll have no additional use visible change to
do, thing will just get auto-magically better and safer.
That does sound like a nice property, I'm just worried about not being able to
meet user expectations when we first expose a page size in bytes.
was (Author: thobbs):
bq. I don't understand, how is that different from your "Perhaps a good first
step is to add support for automatic page size selection"? What did you had in
mind for that? Because the only idea I had to do that from the internal metrics
would be to use the metrics to get a estimated average row size, pick some
presumably hard-coded bytes size target for a page, and compute the actual page
size in rows from that.
Sorry, I should have been more clear. That _is_ what I envisioned for
automatic page size selection. It's not optimal there (due to highly variable
row sizes), but it's basically the server making a best effort attempt, and we
haven't really made any sort of contract with the user. However, I don't think
it's as good of an idea if we expose that as a "page size in bytes" option to
the user. If the user requests a page size of 1MB but we end up reading 50MB
due to abnormally large rows, that seems like bad behavior. Maybe if we
present it as only a "very soft target" for now, that's okay, but I'm mostly
worried about not matching user expectations. With internal paging for
aggregates, there are no user expectations (aside from not OOMing), so it
doesn't matter as much if we're off from our target.
bq. Or to put it another way, having the server pick a default is not the
problem we're trying to fix. The problem we're trying to fix is that to pick a
proper page size, you currently have to guess-estimate the average size of your
rows, but we can do a better guess-estimation server side and that's what we
should provide here.
I think we're trying to solve both. For aggregates, users may not even be
aware that the page size is affecting how the aggregate is handled internally,
and that's especially problematic for cqlsh, where the default page size is 100.
bq. I think we're in agreement that the no-guess-estimate solution is a lot
more involved.
Yes.
bq. And one of the bonus of directly modifying the protocol to allow a page
size target in bytes (rather than only providing a default mode with hard-coded
target server side) is that once we do implement the more involved
change-the-internals solution, we'll have no additional use visible change to
do, thing will just get auto-magically better and safer.
That does sound like a nice property, I'm just worried about not being able to
meet user expectations when we first expose a page size in bytes.
> Have server pick query page size by default
> -------------------------------------------
>
> Key: CASSANDRA-6492
> URL: https://issues.apache.org/jira/browse/CASSANDRA-6492
> Project: Cassandra
> Issue Type: New Feature
> Components: API
> Reporter: Jonathan Ellis
> Assignee: Benjamin Lerer
> Priority: Minor
>
> We're almost always going to do a better job picking a page size based on
> sstable stats, than users will guesstimating.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)