Jeremy Hanna created CASSANDRA-7280:
---------------------------------------

             Summary: Hadoop support not respecting cassandra.input.split.size
                 Key: CASSANDRA-7280
                 URL: https://issues.apache.org/jira/browse/CASSANDRA-7280
             Project: Cassandra
          Issue Type: Bug
          Components: Hadoop
            Reporter: Jeremy Hanna


Long ago (0.7), I tried to set the cassandra.input.split.size property and 
never really got it to respect that property.  However the batch size was 
useful for what I needed to affect the timeouts.

Now with the cql record reader and the native paging, users can specify queries 
potentially using allow filtering clauses.  The input split size is more 
important because the server may have to scan through many many records to get 
matching records.  If the user can effectively set the input split size, then 
that gives a hard limit on how many records it will traverse.

Currently it appears to be overriding the property, perhaps in the 
client.describe_splits_ex method on the server side.

It can be argued that users shouldn't be using allow filtering clauses in their 
cql in the first place.  However it is still a bug that the input split size is 
not honored.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

Reply via email to