[ 
https://issues.apache.org/jira/browse/CASSANDRA-6492?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14629963#comment-14629963
 ] 

Tyler Hobbs edited comment on CASSANDRA-6492 at 7/16/15 4:27 PM:
-----------------------------------------------------------------

bq. I don't understand, how is that different from your "Perhaps a good first 
step is to add support for automatic page size selection"? What did you had in 
mind for that? Because the only idea I had to do that from the internal metrics 
would be to use the metrics to get a estimated average row size, pick some 
presumably hard-coded bytes size target for a page, and compute the actual page 
size in rows from that.

Sorry, I should have been more clear.  That _is_ what I envisioned for 
automatic page size selection.  It's not optimal there (due to highly variable 
row sizes), but it's basically the server making a best effort attempt, and we 
haven't really made any sort of contract with the user. However, I don't think 
it's as good of an idea if we expose that as a "page size in bytes" option to 
the user.  If the user requests a page size of 1MB but we end up reading 50MB 
due to abnormally large rows, that seems like bad behavior.  Maybe if we 
present it as only a "very soft target" for now, that's okay, but I'm mostly 
worried about not matching user expectations.  With server-selected page sizes, 
there are no user expectations (aside from not OOMing), so it doesn't matter as 
much if we're off from our target.

bq. Or to put it another way, having the server pick a default is not the 
problem we're trying to fix. The problem we're trying to fix is that to pick a 
proper page size, you currently have to guess-estimate the average size of your 
rows, but we can do a better guess-estimation server side and that's what we 
should provide here.

I think we're trying to solve both.  For aggregates, users may not even be 
aware that the page size is affecting how the aggregate is handled internally, 
and that's especially problematic for cqlsh, where the default page size is 100.

bq. I think we're in agreement that the no-guess-estimate solution is a lot 
more involved.

Yes.

bq. And one of the bonus of directly modifying the protocol to allow a page 
size target in bytes (rather than only providing a default mode with hard-coded 
target server side) is that once we do implement the more involved 
change-the-internals solution, we'll have no additional use visible change to 
do, thing will just get auto-magically better and safer.

That does sound like a nice property, I'm just worried about not being able to 
meet user expectations when we first expose a page size in bytes.


was (Author: thobbs):
bq. I don't understand, how is that different from your "Perhaps a good first 
step is to add support for automatic page size selection"? What did you had in 
mind for that? Because the only idea I had to do that from the internal metrics 
would be to use the metrics to get a estimated average row size, pick some 
presumably hard-coded bytes size target for a page, and compute the actual page 
size in rows from that.

Sorry, I should have been more clear.  That _is_ what I envisioned for 
automatic page size selection.  It's not optimal there (due to highly variable 
row sizes), but it's basically the server making a best effort attempt, and we 
haven't really made any sort of contract with the user. However, I don't think 
it's as good of an idea if we expose that as a "page size in bytes" option to 
the user.  If the user requests a page size of 1MB but we end up reading 50MB 
due to abnormally large rows, that seems like bad behavior.  Maybe if we 
present it as only a "very soft target" for now, that's okay, but I'm mostly 
worried about not matching user expectations.  With internal paging for 
aggregates, there are no user expectations (aside from not OOMing), so it 
doesn't matter as much if we're off from our target.

bq. Or to put it another way, having the server pick a default is not the 
problem we're trying to fix. The problem we're trying to fix is that to pick a 
proper page size, you currently have to guess-estimate the average size of your 
rows, but we can do a better guess-estimation server side and that's what we 
should provide here.

I think we're trying to solve both.  For aggregates, users may not even be 
aware that the page size is affecting how the aggregate is handled internally, 
and that's especially problematic for cqlsh, where the default page size is 100.

bq. I think we're in agreement that the no-guess-estimate solution is a lot 
more involved.

Yes.

bq. And one of the bonus of directly modifying the protocol to allow a page 
size target in bytes (rather than only providing a default mode with hard-coded 
target server side) is that once we do implement the more involved 
change-the-internals solution, we'll have no additional use visible change to 
do, thing will just get auto-magically better and safer.

That does sound like a nice property, I'm just worried about not being able to 
meet user expectations when we first expose a page size in bytes.

> Have server pick query page size by default
> -------------------------------------------
>
>                 Key: CASSANDRA-6492
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-6492
>             Project: Cassandra
>          Issue Type: New Feature
>          Components: API
>            Reporter: Jonathan Ellis
>            Assignee: Benjamin Lerer
>            Priority: Minor
>
> We're almost always going to do a better job picking a page size based on 
> sstable stats, than users will guesstimating.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to