[ 
https://issues.apache.org/jira/browse/CASSANDRA-6091?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13799246#comment-13799246
 ] 

Alex Liu commented on CASSANDRA-6091:
-------------------------------------

Let's wait for the testing result on the larger data  when vnodes are enable. 
As for now, the limit of thread pool should resolve the issue for generating 
splits. If data is bigger enough, there will be multiple splits for each vnode, 
so It doesn't help to have range merged and sudo splits. One potential issue 
with vnodes is there could be potential many small corner splits (the last 
split for the vnode). e.g. 256 vnodes per a node, so potential we could end 
with around 256 small corner splits. If we disable vnode, those small corner 
split will be merged into bigger splits. 

As for data locality, we need more investigation. It's related to the number of 
splits and number of tasks run on each node, and how busy each node. 

If the testing results show it's a bigger issue than we expected, I will 
implement the merge approach.



> Better Vnode support in hadoop/pig
> ----------------------------------
>
>                 Key: CASSANDRA-6091
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-6091
>             Project: Cassandra
>          Issue Type: Bug
>          Components: Hadoop
>            Reporter: Alex Liu
>            Assignee: Alex Liu
>
> CASSANDRA-6084 shows there are some issues during running hadoop/pig job if 
> vnodes are enable. Also the hadoop performance of vnode enabled nodes  are 
> bad for there are so many splits.
> The idea is to combine vnode splits into a big sudo splits so it work like 
> vnode is disable for hadoop/pig job



--
This message was sent by Atlassian JIRA
(v6.1#6144)

Reply via email to