[ 
https://issues.apache.org/jira/browse/CASSANDRA-10835?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15049335#comment-15049335
 ] 

Jeremiah Jordan commented on CASSANDRA-10835:
---------------------------------------------

bq. Make the C* to be compatible with older version.

I'd go with that one.  Putting it back the way things used to be will cause the 
least confusion going forward, otherwise people with existing jobs are going to 
start seeing crazy stuff happening in them.

> CqlInputFormat  creates too small splits for map Hadoop tasks
> -------------------------------------------------------------
>
>                 Key: CASSANDRA-10835
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-10835
>             Project: Cassandra
>          Issue Type: Bug
>            Reporter: Artem Aliev
>         Attachments: cassandra-3.0.1-10835.txt
>
>
> CqlInputFormat use number of rows in C* version < 2.2 to define split size
> The default split size was 64K rows.
> {code}
>     private static final int DEFAULT_SPLIT_SIZE = 64 * 1024;
> {code}
> The doc:
> {code}
> * You can also configure the number of rows per InputSplit with
>  *   ConfigHelper.setInputSplitSize. The default split size is 64k rows.
>  {code}
> New split algorithm assumes that SPLIT size is in bytes, so it creates really 
> small map hadoop tasks by default (or with old configs).
> There two way to fix it:
> 1. Update the doc and increase default value to something like 16MB
> 2. Make the C* to be compatible with older version.
> I like the second options, as it will not surprise people who upgrade from 
> old versions. I do not expect a lot of new user that will use Hadoop.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to