[ https://issues.apache.org/jira/browse/CASSANDRA-6091?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13799246#comment-13799246 ]
Alex Liu commented on CASSANDRA-6091: ------------------------------------- Let's wait for the testing result on the larger data when vnodes are enable. As for now, the limit of thread pool should resolve the issue for generating splits. If data is bigger enough, there will be multiple splits for each vnode, so It doesn't help to have range merged and sudo splits. One potential issue with vnodes is there could be potential many small corner splits (the last split for the vnode). e.g. 256 vnodes per a node, so potential we could end with around 256 small corner splits. If we disable vnode, those small corner split will be merged into bigger splits. As for data locality, we need more investigation. It's related to the number of splits and number of tasks run on each node, and how busy each node. If the testing results show it's a bigger issue than we expected, I will implement the merge approach. > Better Vnode support in hadoop/pig > ---------------------------------- > > Key: CASSANDRA-6091 > URL: https://issues.apache.org/jira/browse/CASSANDRA-6091 > Project: Cassandra > Issue Type: Bug > Components: Hadoop > Reporter: Alex Liu > Assignee: Alex Liu > > CASSANDRA-6084 shows there are some issues during running hadoop/pig job if > vnodes are enable. Also the hadoop performance of vnode enabled nodes are > bad for there are so many splits. > The idea is to combine vnode splits into a big sudo splits so it work like > vnode is disable for hadoop/pig job -- This message was sent by Atlassian JIRA (v6.1#6144)