Add hadoop support option to skip rows with empty columns
---------------------------------------------------------

                 Key: CASSANDRA-2855
                 URL: https://issues.apache.org/jira/browse/CASSANDRA-2855
             Project: Cassandra
          Issue Type: Improvement
          Components: Hadoop
            Reporter: Jeremy Hanna
            Assignee: Jeremy Hanna


We have been finding that range ghosts appear in results from Hadoop via Pig.  
This could also happen if rows don't have data for the slice predicate that is 
given.  This leads to having to do a painful amount of defensive checking on 
the Pig side, especially in the case of range ghosts.

We would like to add an option to skip rows that have no column values in it.  
That functionality existed before in core Cassandra but was removed because of 
the performance penalty of that checking.  However with Hadoop support in the 
RecordReader, that is batch oriented anyway, so individual row reading 
performance isn't as much of an issue.  Also we would make it an optional 
config parameter for each job anyway, so people wouldn't have to incur that 
penalty if they are confident that there won't be those empty rows or they 
don't care.

It could be parameter cassandra.skip.empty.rows and be true/false.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

Reply via email to