[ 
https://issues.apache.org/jira/browse/CASSANDRA-4304?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13295743#comment-13295743
 ] 

Christian Spriegel commented on CASSANDRA-4304:
-----------------------------------------------

Brandon, thank you for your feedback. I also see the need for these 
operator-limits. But I think they should be implemented in addition to 
client-specified limits as proposed by me.

Here is why:
# A operator-limit should throw an exception if too much data is loaded (maybe 
not an exception but some kind of flag in the result). If the server would 
silently reduce the amount of results, then the client would not know if there 
simply is no more data or if it was limited due to size. Think of some client 
asking for fixed-size batches for some processing - the operator would silently 
break the application by turning on the size limit.
# More important (to me): I have different queries that expect different 
batch-sizes. Therefore I need the application to be able to control the result 
size. For example: mobile devices need smaller batches than a backend system 
that calls our middleware.

Is there any reason not to have a client-limit? I agree, that adding another 
limit parameter does not look nice. In thrift we could reuse the the existing 
limit parameter and use the negative value range for byte limits :-). In 
cql/cli a new keyword might be nicer though.

... but I digress. Any thoughts?

I dont know if it helps, but I would be willing to contribute.
                
> Add bytes-limit clause to queries
> ---------------------------------
>
>                 Key: CASSANDRA-4304
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-4304
>             Project: Cassandra
>          Issue Type: New Feature
>          Components: API, Core
>            Reporter: Christian Spriegel
>             Fix For: 1.2
>
>         Attachments: TestImplForSlices.patch
>
>
> Idea is to add a second limit clause to (slice)queries. This would allow easy 
> loading of batches, even if content is variable sized.
> Imagine the following use case:
> You want to load a batch of XMLs, where each is between 100bytes and 5MB 
> large.
> Currently you can load either
> - a large number of XMLs, but risk OOMs or timeouts
> or
> - a small number of XMLs, and do too many queries where each query usually 
> retrieves very little data.
> With cassandra being able to limit by size and not just count, we could do a 
> single query which would never OOM but always return a decent amount of data 
> -- with no extra overhead for multiple queries.
> Few thoughts from my side:
> - The limit should be a soft limit, not a hard limit. Therefore it will 
> always return at least one row/column, even if that one large than the limit 
> specifies.
> - HintedHandoffManager:303 is already doing a 
> InMemoryCompactionLimit/averageColumnSize to avoid OOM. It could then simply 
> use the new limit clause :-)
> - A bytes-limit on a range- or indexed-query should always return a complete 
> row

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

Reply via email to