[jira] [Commented] (CASSANDRA-2855) Skip rows with empty columns when slicing entire row

2011-11-11 Thread Jeremy Hanna (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-2855?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13148832#comment-13148832
 ] 

Jeremy Hanna commented on CASSANDRA-2855:
-

fwiw - saw an interesting analogous ticket for hbase storage - 
https://issues.apache.org/jira/browse/PIG-2114
it talks about omitNulls and how it's used on the load and on the store side.

 Skip rows with empty columns when slicing entire row
 

 Key: CASSANDRA-2855
 URL: https://issues.apache.org/jira/browse/CASSANDRA-2855
 Project: Cassandra
  Issue Type: Improvement
  Components: API
Reporter: Jeremy Hanna
Assignee: T Jake Luciani
Priority: Minor
  Labels: hadoop
 Fix For: 0.8.8

 Attachments: 2855-v2.txt, 2855-v3.txt, 2855-v4.txt, 2855-v5.txt, 
 v1-0001-CASSANDRA-2855-ignore-ghosts-when-no-predicate-specifi.txt


 We have been finding that range ghosts appear in results from Hadoop via Pig. 
  This could also happen if rows don't have data for the slice predicate that 
 is given.  This leads to having to do a painful amount of defensive checking 
 on the Pig side, especially in the case of range ghosts.
 We would like to add an option to skip rows that have no column values in it. 
  That functionality existed before in core Cassandra but was removed because 
 of the performance penalty of that checking.  However with Hadoop support in 
 the RecordReader, that is batch oriented anyway, so individual row reading 
 performance isn't as much of an issue.  Also we would make it an optional 
 config parameter for each job anyway, so people wouldn't have to incur that 
 penalty if they are confident that there won't be those empty rows or they 
 don't care.
 It could be parameter cassandra.skip.empty.rows and be true/false.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (CASSANDRA-2855) Skip rows with empty columns when slicing entire row

2011-11-10 Thread Brandon Williams (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-2855?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13147879#comment-13147879
 ] 

Brandon Williams commented on CASSANDRA-2855:
-

+1

 Skip rows with empty columns when slicing entire row
 

 Key: CASSANDRA-2855
 URL: https://issues.apache.org/jira/browse/CASSANDRA-2855
 Project: Cassandra
  Issue Type: Improvement
  Components: API
Reporter: Jeremy Hanna
Assignee: T Jake Luciani
Priority: Minor
  Labels: hadoop
 Fix For: 0.8.8

 Attachments: 2855-v2.txt, 2855-v3.txt, 2855-v4.txt, 2855-v5.txt, 
 v1-0001-CASSANDRA-2855-ignore-ghosts-when-no-predicate-specifi.txt


 We have been finding that range ghosts appear in results from Hadoop via Pig. 
  This could also happen if rows don't have data for the slice predicate that 
 is given.  This leads to having to do a painful amount of defensive checking 
 on the Pig side, especially in the case of range ghosts.
 We would like to add an option to skip rows that have no column values in it. 
  That functionality existed before in core Cassandra but was removed because 
 of the performance penalty of that checking.  However with Hadoop support in 
 the RecordReader, that is batch oriented anyway, so individual row reading 
 performance isn't as much of an issue.  Also we would make it an optional 
 config parameter for each job anyway, so people wouldn't have to incur that 
 penalty if they are confident that there won't be those empty rows or they 
 don't care.
 It could be parameter cassandra.skip.empty.rows and be true/false.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (CASSANDRA-2855) Skip rows with empty columns when slicing entire row

2011-11-10 Thread Hudson (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-2855?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13148007#comment-13148007
 ] 

Hudson commented on CASSANDRA-2855:
---

Integrated in Cassandra-0.8 #398 (See 
[https://builds.apache.org/job/Cassandra-0.8/398/])
Skip empty rows when entire row is requested, redux.
Patch by tjake, reviewed by brandonwilliams for CASSANDRA-2855

brandonwilliams : 
http://svn.apache.org/viewcvs.cgi/?root=Apache-SVNview=revrev=1200471
Files : 
* /cassandra/branches/cassandra-0.8/CHANGES.txt
* 
/cassandra/branches/cassandra-0.8/src/java/org/apache/cassandra/hadoop/ColumnFamilyRecordReader.java


 Skip rows with empty columns when slicing entire row
 

 Key: CASSANDRA-2855
 URL: https://issues.apache.org/jira/browse/CASSANDRA-2855
 Project: Cassandra
  Issue Type: Improvement
  Components: API
Reporter: Jeremy Hanna
Assignee: T Jake Luciani
Priority: Minor
  Labels: hadoop
 Fix For: 0.8.8

 Attachments: 2855-v2.txt, 2855-v3.txt, 2855-v4.txt, 2855-v5.txt, 
 v1-0001-CASSANDRA-2855-ignore-ghosts-when-no-predicate-specifi.txt


 We have been finding that range ghosts appear in results from Hadoop via Pig. 
  This could also happen if rows don't have data for the slice predicate that 
 is given.  This leads to having to do a painful amount of defensive checking 
 on the Pig side, especially in the case of range ghosts.
 We would like to add an option to skip rows that have no column values in it. 
  That functionality existed before in core Cassandra but was removed because 
 of the performance penalty of that checking.  However with Hadoop support in 
 the RecordReader, that is batch oriented anyway, so individual row reading 
 performance isn't as much of an issue.  Also we would make it an optional 
 config parameter for each job anyway, so people wouldn't have to incur that 
 penalty if they are confident that there won't be those empty rows or they 
 don't care.
 It could be parameter cassandra.skip.empty.rows and be true/false.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (CASSANDRA-2855) Skip rows with empty columns when slicing entire row

2011-11-08 Thread T Jake Luciani (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-2855?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13146313#comment-13146313
 ] 

T Jake Luciani commented on CASSANDRA-2855:
---

Reverted will submit a new patch

 Skip rows with empty columns when slicing entire row
 

 Key: CASSANDRA-2855
 URL: https://issues.apache.org/jira/browse/CASSANDRA-2855
 Project: Cassandra
  Issue Type: Improvement
  Components: API
Reporter: Jeremy Hanna
Assignee: Jeremy Hanna
Priority: Minor
  Labels: hadoop
 Fix For: 0.8.8

 Attachments: 2855-v2.txt, 2855-v3.txt, 2855-v4.txt, 2855-v5.txt


 We have been finding that range ghosts appear in results from Hadoop via Pig. 
  This could also happen if rows don't have data for the slice predicate that 
 is given.  This leads to having to do a painful amount of defensive checking 
 on the Pig side, especially in the case of range ghosts.
 We would like to add an option to skip rows that have no column values in it. 
  That functionality existed before in core Cassandra but was removed because 
 of the performance penalty of that checking.  However with Hadoop support in 
 the RecordReader, that is batch oriented anyway, so individual row reading 
 performance isn't as much of an issue.  Also we would make it an optional 
 config parameter for each job anyway, so people wouldn't have to incur that 
 penalty if they are confident that there won't be those empty rows or they 
 don't care.
 It could be parameter cassandra.skip.empty.rows and be true/false.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (CASSANDRA-2855) Skip rows with empty columns when slicing entire row

2011-11-08 Thread Hudson (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-2855?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13146376#comment-13146376
 ] 

Hudson commented on CASSANDRA-2855:
---

Integrated in Cassandra-0.8 #395 (See 
[https://builds.apache.org/job/Cassandra-0.8/395/])
Revert CASSANDRA-2855

jake : http://svn.apache.org/viewcvs.cgi/?root=Apache-SVNview=revrev=1199245
Files : 
* /cassandra/branches/cassandra-0.8/CHANGES.txt
* 
/cassandra/branches/cassandra-0.8/src/java/org/apache/cassandra/hadoop/ColumnFamilyRecordReader.java


 Skip rows with empty columns when slicing entire row
 

 Key: CASSANDRA-2855
 URL: https://issues.apache.org/jira/browse/CASSANDRA-2855
 Project: Cassandra
  Issue Type: Improvement
  Components: API
Reporter: Jeremy Hanna
Assignee: Jeremy Hanna
Priority: Minor
  Labels: hadoop
 Fix For: 0.8.8

 Attachments: 2855-v2.txt, 2855-v3.txt, 2855-v4.txt, 2855-v5.txt


 We have been finding that range ghosts appear in results from Hadoop via Pig. 
  This could also happen if rows don't have data for the slice predicate that 
 is given.  This leads to having to do a painful amount of defensive checking 
 on the Pig side, especially in the case of range ghosts.
 We would like to add an option to skip rows that have no column values in it. 
  That functionality existed before in core Cassandra but was removed because 
 of the performance penalty of that checking.  However with Hadoop support in 
 the RecordReader, that is batch oriented anyway, so individual row reading 
 performance isn't as much of an issue.  Also we would make it an optional 
 config parameter for each job anyway, so people wouldn't have to incur that 
 penalty if they are confident that there won't be those empty rows or they 
 don't care.
 It could be parameter cassandra.skip.empty.rows and be true/false.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (CASSANDRA-2855) Skip rows with empty columns when slicing entire row

2011-10-06 Thread Jeremy Hanna (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-2855?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13122016#comment-13122016
 ] 

Jeremy Hanna commented on CASSANDRA-2855:
-

+1

 Skip rows with empty columns when slicing entire row
 

 Key: CASSANDRA-2855
 URL: https://issues.apache.org/jira/browse/CASSANDRA-2855
 Project: Cassandra
  Issue Type: Improvement
  Components: API
Reporter: Jeremy Hanna
Assignee: Jeremy Hanna
Priority: Minor
  Labels: hadoop
 Fix For: 0.8.8

 Attachments: 2855-v2.txt, 2855-v3.txt, 2855-v4.txt, 2855-v5.txt


 We have been finding that range ghosts appear in results from Hadoop via Pig. 
  This could also happen if rows don't have data for the slice predicate that 
 is given.  This leads to having to do a painful amount of defensive checking 
 on the Pig side, especially in the case of range ghosts.
 We would like to add an option to skip rows that have no column values in it. 
  That functionality existed before in core Cassandra but was removed because 
 of the performance penalty of that checking.  However with Hadoop support in 
 the RecordReader, that is batch oriented anyway, so individual row reading 
 performance isn't as much of an issue.  Also we would make it an optional 
 config parameter for each job anyway, so people wouldn't have to incur that 
 penalty if they are confident that there won't be those empty rows or they 
 don't care.
 It could be parameter cassandra.skip.empty.rows and be true/false.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (CASSANDRA-2855) Skip rows with empty columns when slicing entire row

2011-08-22 Thread Jonathan Ellis (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-2855?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13089040#comment-13089040
 ] 

Jonathan Ellis commented on CASSANDRA-2855:
---

If we are only only skipping when it the predicate covers the entire row (which 
is the Right Thing imo), why do we need the configuration setting?  Can't we 
make it just always skip?  Look at it this way: you're giving the same result 
that the user would see anyway if he had a lower tombstone grace.

 Skip rows with empty columns when slicing entire row
 

 Key: CASSANDRA-2855
 URL: https://issues.apache.org/jira/browse/CASSANDRA-2855
 Project: Cassandra
  Issue Type: Improvement
  Components: API
Reporter: Jeremy Hanna
Assignee: Jeremy Hanna
Priority: Minor
  Labels: hadoop
 Fix For: 0.8.5

 Attachments: 2855-v2.txt, 2855-v3.txt, 2855-v4.txt


 We have been finding that range ghosts appear in results from Hadoop via Pig. 
  This could also happen if rows don't have data for the slice predicate that 
 is given.  This leads to having to do a painful amount of defensive checking 
 on the Pig side, especially in the case of range ghosts.
 We would like to add an option to skip rows that have no column values in it. 
  That functionality existed before in core Cassandra but was removed because 
 of the performance penalty of that checking.  However with Hadoop support in 
 the RecordReader, that is batch oriented anyway, so individual row reading 
 performance isn't as much of an issue.  Also we would make it an optional 
 config parameter for each job anyway, so people wouldn't have to incur that 
 penalty if they are confident that there won't be those empty rows or they 
 don't care.
 It could be parameter cassandra.skip.empty.rows and be true/false.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (CASSANDRA-2855) Skip rows with empty columns when slicing entire row

2011-08-22 Thread Jeremy Hanna (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-2855?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13089045#comment-13089045
 ] 

Jeremy Hanna commented on CASSANDRA-2855:
-

True - wouldn't matter.

 Skip rows with empty columns when slicing entire row
 

 Key: CASSANDRA-2855
 URL: https://issues.apache.org/jira/browse/CASSANDRA-2855
 Project: Cassandra
  Issue Type: Improvement
  Components: API
Reporter: Jeremy Hanna
Assignee: Jeremy Hanna
Priority: Minor
  Labels: hadoop
 Fix For: 0.8.5

 Attachments: 2855-v2.txt, 2855-v3.txt, 2855-v4.txt


 We have been finding that range ghosts appear in results from Hadoop via Pig. 
  This could also happen if rows don't have data for the slice predicate that 
 is given.  This leads to having to do a painful amount of defensive checking 
 on the Pig side, especially in the case of range ghosts.
 We would like to add an option to skip rows that have no column values in it. 
  That functionality existed before in core Cassandra but was removed because 
 of the performance penalty of that checking.  However with Hadoop support in 
 the RecordReader, that is batch oriented anyway, so individual row reading 
 performance isn't as much of an issue.  Also we would make it an optional 
 config parameter for each job anyway, so people wouldn't have to incur that 
 penalty if they are confident that there won't be those empty rows or they 
 don't care.
 It could be parameter cassandra.skip.empty.rows and be true/false.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (CASSANDRA-2855) Skip rows with empty columns when slicing entire row

2011-08-03 Thread Brandon Williams (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-2855?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13078912#comment-13078912
 ] 

Brandon Williams commented on CASSANDRA-2855:
-

skip.empty.results should probably be 'skip.empty.rows' or 'skip.tombstones' 
and there needs to be a check on the predicate to see if it covers the entire 
row, and if so suppress the tombstone, but if not return the empty slice.

 Skip rows with empty columns when slicing entire row
 

 Key: CASSANDRA-2855
 URL: https://issues.apache.org/jira/browse/CASSANDRA-2855
 Project: Cassandra
  Issue Type: Improvement
  Components: API
Reporter: Jeremy Hanna
Assignee: Jeremy Hanna
Priority: Minor
  Labels: hadoop
 Fix For: 0.8.4

 Attachments: 2855-v2.txt, 2855-v3.txt


 We have been finding that range ghosts appear in results from Hadoop via Pig. 
  This could also happen if rows don't have data for the slice predicate that 
 is given.  This leads to having to do a painful amount of defensive checking 
 on the Pig side, especially in the case of range ghosts.
 We would like to add an option to skip rows that have no column values in it. 
  That functionality existed before in core Cassandra but was removed because 
 of the performance penalty of that checking.  However with Hadoop support in 
 the RecordReader, that is batch oriented anyway, so individual row reading 
 performance isn't as much of an issue.  Also we would make it an optional 
 config parameter for each job anyway, so people wouldn't have to incur that 
 penalty if they are confident that there won't be those empty rows or they 
 don't care.
 It could be parameter cassandra.skip.empty.rows and be true/false.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (CASSANDRA-2855) Skip rows with empty columns when slicing entire row

2011-07-25 Thread Jeremy Hanna (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-2855?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13070764#comment-13070764
 ] 

Jeremy Hanna commented on CASSANDRA-2855:
-

Brandon was saying that empty slice comment only referred to core Cassandra, so 
in the CFRR I just skipped any key didn't have values - hoping that 
isSetColumns handles all cases for that.

 Skip rows with empty columns when slicing entire row
 

 Key: CASSANDRA-2855
 URL: https://issues.apache.org/jira/browse/CASSANDRA-2855
 Project: Cassandra
  Issue Type: Improvement
  Components: API
Reporter: Jeremy Hanna
Assignee: Jeremy Hanna
Priority: Minor
  Labels: hadoop
 Fix For: 0.7.9, 0.8.3

 Attachments: 2855.txt


 We have been finding that range ghosts appear in results from Hadoop via Pig. 
  This could also happen if rows don't have data for the slice predicate that 
 is given.  This leads to having to do a painful amount of defensive checking 
 on the Pig side, especially in the case of range ghosts.
 We would like to add an option to skip rows that have no column values in it. 
  That functionality existed before in core Cassandra but was removed because 
 of the performance penalty of that checking.  However with Hadoop support in 
 the RecordReader, that is batch oriented anyway, so individual row reading 
 performance isn't as much of an issue.  Also we would make it an optional 
 config parameter for each job anyway, so people wouldn't have to incur that 
 penalty if they are confident that there won't be those empty rows or they 
 don't care.
 It could be parameter cassandra.skip.empty.rows and be true/false.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (CASSANDRA-2855) Skip rows with empty columns when slicing entire row

2011-07-05 Thread Jeremy Hanna (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-2855?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13059995#comment-13059995
 ] 

Jeremy Hanna commented on CASSANDRA-2855:
-

is it more expensive/complicated to do it for an empty slice or is that just 
orthogonal to this since that is handled in a different place?

 Skip rows with empty columns when slicing entire row
 

 Key: CASSANDRA-2855
 URL: https://issues.apache.org/jira/browse/CASSANDRA-2855
 Project: Cassandra
  Issue Type: Improvement
  Components: API
Reporter: Jeremy Hanna
Assignee: Jonathan Ellis
Priority: Minor
  Labels: hadoop
 Fix For: 0.8.2


 We have been finding that range ghosts appear in results from Hadoop via Pig. 
  This could also happen if rows don't have data for the slice predicate that 
 is given.  This leads to having to do a painful amount of defensive checking 
 on the Pig side, especially in the case of range ghosts.
 We would like to add an option to skip rows that have no column values in it. 
  That functionality existed before in core Cassandra but was removed because 
 of the performance penalty of that checking.  However with Hadoop support in 
 the RecordReader, that is batch oriented anyway, so individual row reading 
 performance isn't as much of an issue.  Also we would make it an optional 
 config parameter for each job anyway, so people wouldn't have to incur that 
 penalty if they are confident that there won't be those empty rows or they 
 don't care.
 It could be parameter cassandra.skip.empty.rows and be true/false.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (CASSANDRA-2855) Skip rows with empty columns when slicing entire row

2011-07-05 Thread Jonathan Ellis (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-2855?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13059996#comment-13059996
 ] 

Jonathan Ellis commented on CASSANDRA-2855:
---

sylvain points out that doing this at the Thrift layer would break the row 
count contract.  We could still do this at the CFRR level.

 Skip rows with empty columns when slicing entire row
 

 Key: CASSANDRA-2855
 URL: https://issues.apache.org/jira/browse/CASSANDRA-2855
 Project: Cassandra
  Issue Type: Improvement
  Components: API
Reporter: Jeremy Hanna
Assignee: Jonathan Ellis
Priority: Minor
  Labels: hadoop
 Fix For: 0.8.2


 We have been finding that range ghosts appear in results from Hadoop via Pig. 
  This could also happen if rows don't have data for the slice predicate that 
 is given.  This leads to having to do a painful amount of defensive checking 
 on the Pig side, especially in the case of range ghosts.
 We would like to add an option to skip rows that have no column values in it. 
  That functionality existed before in core Cassandra but was removed because 
 of the performance penalty of that checking.  However with Hadoop support in 
 the RecordReader, that is batch oriented anyway, so individual row reading 
 performance isn't as much of an issue.  Also we would make it an optional 
 config parameter for each job anyway, so people wouldn't have to incur that 
 penalty if they are confident that there won't be those empty rows or they 
 don't care.
 It could be parameter cassandra.skip.empty.rows and be true/false.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (CASSANDRA-2855) Skip rows with empty columns when slicing entire row

2011-07-05 Thread Jonathan Ellis (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-2855?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13060002#comment-13060002
 ] 

Jonathan Ellis commented on CASSANDRA-2855:
---

bq. is it more expensive/complicated to do it for an empty slice

empty result for entire row slice means it really will be gone when tombstone 
expires, so the two are semantically equivalent.  this is not the case for a 
smaller slice; an empty result for that could mean there is data in the row, 
just not in the slice you requested.  so leaving that out would be an error.

 Skip rows with empty columns when slicing entire row
 

 Key: CASSANDRA-2855
 URL: https://issues.apache.org/jira/browse/CASSANDRA-2855
 Project: Cassandra
  Issue Type: Improvement
  Components: API
Reporter: Jeremy Hanna
Priority: Minor
  Labels: hadoop
 Fix For: 0.8.2


 We have been finding that range ghosts appear in results from Hadoop via Pig. 
  This could also happen if rows don't have data for the slice predicate that 
 is given.  This leads to having to do a painful amount of defensive checking 
 on the Pig side, especially in the case of range ghosts.
 We would like to add an option to skip rows that have no column values in it. 
  That functionality existed before in core Cassandra but was removed because 
 of the performance penalty of that checking.  However with Hadoop support in 
 the RecordReader, that is batch oriented anyway, so individual row reading 
 performance isn't as much of an issue.  Also we would make it an optional 
 config parameter for each job anyway, so people wouldn't have to incur that 
 penalty if they are confident that there won't be those empty rows or they 
 don't care.
 It could be parameter cassandra.skip.empty.rows and be true/false.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira