[ 
https://issues.apache.org/jira/browse/CASSANDRA-6091?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14317090#comment-14317090
 ] 

mck edited comment on CASSANDRA-6091 at 2/11/15 10:18 PM:
----------------------------------------------------------

The approach in the patch is to do allow multiple token ranges per split.
We do this with our custom input formats, and it is (very) effective in that it 
means splitSize is honoured.

Handling multiple token ranges per split requires for example the code change 
found in CqlRecordReader whereby the reader must iterate over both rows and 
tokenRanges.

The grouping of token rages by common location sets, so that splits again 
honour the splitSize, happens in 
AbstractColumnFamilyInputForma.collectSplits(..)

Token ranges do not need to be adjacent.
Everything in this patch is done client-side.




was (Author: michaelsembwever):
The approach in the patch is to do allow multiple token ranges per split.
We do with our custom input formats, and it is (very) effective in that it 
means splitSize is honoured.

Handling multiple token ranges per split requires for example the code change 
found in CqlRecordReader whereby the reader must iterate over both rows and 
tokenRanges.

The grouping of token rages by common location sets, so that splits again 
honour the splitSize, happens in 
AbstractColumnFamilyInputForma.collectSplits(..)

Token ranges do not need to be adjacent.
Everything in this patch is done client-side.



> Better Vnode support in hadoop/pig
> ----------------------------------
>
>                 Key: CASSANDRA-6091
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-6091
>             Project: Cassandra
>          Issue Type: Bug
>          Components: Hadoop
>            Reporter: Alex Liu
>            Assignee: Alex Liu
>
> CASSANDRA-6084 shows there are some issues during running hadoop/pig job if 
> vnodes are enable. Also the hadoop performance of vnode enabled nodes  are 
> bad for there are so many splits.
> The idea is to combine vnode splits into a big sudo splits so it work like 
> vnode is disable for hadoop/pig job



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to