[ https://issues.apache.org/jira/browse/CASSANDRA-4304?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13295743#comment-13295743 ]
Christian Spriegel commented on CASSANDRA-4304: ----------------------------------------------- Brandon, thank you for your feedback. I also see the need for these operator-limits. But I think they should be implemented in addition to client-specified limits as proposed by me. Here is why: # A operator-limit should throw an exception if too much data is loaded (maybe not an exception but some kind of flag in the result). If the server would silently reduce the amount of results, then the client would not know if there simply is no more data or if it was limited due to size. Think of some client asking for fixed-size batches for some processing - the operator would silently break the application by turning on the size limit. # More important (to me): I have different queries that expect different batch-sizes. Therefore I need the application to be able to control the result size. For example: mobile devices need smaller batches than a backend system that calls our middleware. Is there any reason not to have a client-limit? I agree, that adding another limit parameter does not look nice. In thrift we could reuse the the existing limit parameter and use the negative value range for byte limits :-). In cql/cli a new keyword might be nicer though. ... but I digress. Any thoughts? I dont know if it helps, but I would be willing to contribute. > Add bytes-limit clause to queries > --------------------------------- > > Key: CASSANDRA-4304 > URL: https://issues.apache.org/jira/browse/CASSANDRA-4304 > Project: Cassandra > Issue Type: New Feature > Components: API, Core > Reporter: Christian Spriegel > Fix For: 1.2 > > Attachments: TestImplForSlices.patch > > > Idea is to add a second limit clause to (slice)queries. This would allow easy > loading of batches, even if content is variable sized. > Imagine the following use case: > You want to load a batch of XMLs, where each is between 100bytes and 5MB > large. > Currently you can load either > - a large number of XMLs, but risk OOMs or timeouts > or > - a small number of XMLs, and do too many queries where each query usually > retrieves very little data. > With cassandra being able to limit by size and not just count, we could do a > single query which would never OOM but always return a decent amount of data > -- with no extra overhead for multiple queries. > Few thoughts from my side: > - The limit should be a soft limit, not a hard limit. Therefore it will > always return at least one row/column, even if that one large than the limit > specifies. > - HintedHandoffManager:303 is already doing a > InMemoryCompactionLimit/averageColumnSize to avoid OOM. It could then simply > use the new limit clause :-) > - A bytes-limit on a range- or indexed-query should always return a complete > row -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira