[
https://issues.apache.org/jira/browse/CASSANDRA-2184?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Jonathan Ellis updated CASSANDRA-2184:
--------------------------------------
Remaining Estimate: 4h
Original Estimate: 4h
> Returning split length of 0 confuses Pig
> ----------------------------------------
>
> Key: CASSANDRA-2184
> URL: https://issues.apache.org/jira/browse/CASSANDRA-2184
> Project: Cassandra
> Issue Type: Bug
> Components: Hadoop
> Affects Versions: 0.6
> Reporter: Jonathan Ellis
> Assignee: Brandon Williams
> Priority: Minor
> Fix For: 0.7.3
>
> Original Estimate: 4h
> Remaining Estimate: 4h
>
> Matt Kennedy reports on the user list,
> bq. There is a new feature in Pig 0.8 that will try to reduce the number of
> splits used to speed up the whole job. Since the ColumnFamilyInputFormat
> lists the input size as zero, this feature eliminates all of the splits
> except for one.
> bq. The workaround is to disable this feature for jobs that use
> CassandraStorage by setting -Dpig.splitCombination=false in the pig_cassandra
> script.
> {noformat}
> bq. However, we wanted to keep splitCombination on because it is a useful
> optimization for a lot of our use cases, so I went digging for the least
> intrusive way to keep the split combiner on, but also prevent it from
> combining splits that read from Cassandra. My solution, which you are
> welcome to critique, is to change line 65 of
> http://svn.apache.org/viewvc/cassandra/trunk/src/java/org/apache/cassandra/hadoop/ColumnFamilySplit.java
> such that it returns Long.MAX_VALUE instead of zero.
> I looked into actually returning the number of keys in the split but Hadoop
> javadoc says "Get the size of the split, so that the input splits can be
> sorted by size" so since our splits should be very very close in size this
> doesn't sound like it's worth doing an extra round trip to the host servers
> to get super accurate numbers on. Returning MAX_VALUE seems like it's good
> enough.
--
This message is automatically generated by JIRA.
-
For more information on JIRA, see: http://www.atlassian.com/software/jira