[
https://issues.apache.org/jira/browse/CASSANDRA-4789?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13474171#comment-13474171
]
Will Oberman commented on CASSANDRA-4789:
-----------------------------------------
It's definitely a problem using widerows=true and having keys that only map to
one column. Here is my patch on 1.1.5 (I also have a one line patch to fix the
wide rows bug):
112a113
> private ByteBuffer lastKey;
116d116
<
153a154
> //check key == lastKey?
159a161
> lastKey = null;
177a180
> lastKey = (ByteBuffer)reader.getCurrentKey();
185a189,200
> if(lastKey != null && !(key.equals(lastKey))) // last key
> only had one value
> {
> tuple.append(new DataByteArray(lastKey.array(),
> lastKey.position()+lastKey.arrayOffset(),
> lastKey.limit()+lastKey.arrayOffset()));
> for (Map.Entry<ByteBuffer, IColumn> entry :
> lastRow.entrySet())
> {
> bag.add(columnToTuple(entry.getValue(), cfDef,
> parseType(cfDef.getComparator_type())));
> }
> tuple.append(bag);
> lastKey = key;
> lastRow =
> (SortedMap<ByteBuffer,IColumn>)reader.getCurrentValue();
> return tuple;
> }
194a210
> lastKey = null;
551c567
< widerows = Boolean.valueOf(System.getProperty(PIG_WIDEROW_INPUT));
---
> widerows = Boolean.valueOf(System.getenv(PIG_WIDEROW_INPUT));
> CassandraStorage.getNextWide produces corrupt data
> --------------------------------------------------
>
> Key: CASSANDRA-4789
> URL: https://issues.apache.org/jira/browse/CASSANDRA-4789
> Project: Cassandra
> Issue Type: Bug
> Components: Hadoop
> Affects Versions: 1.1.5
> Reporter: Will Oberman
> Assignee: Brandon Williams
>
> This took me a while to track down. I'm seeing the problem when the "key
> changes" case happens. The intended behavior (as far as I can tell) when the
> key changes is the method returns the current tuple, and picks up where it
> left off on the next call to getNextWide(). The problem I'm seeing is the
> sometimes the current key advances between method calls, sometimes not.
> "Not" being the correct behavior, since the code is saving the value into an
> instance variable, but when the key advances there is a key/value mismatch
> (the result being the values for two different keys are being glued
> together). I think the problem might be related to keys that only have a
> single column??? I'm still trying to track that down to help assist in
> solving this case...
> Maybe this will be clearer from me pasting a bunch of logging I added to the
> class. The log messages are fairly self documenting (I hope):
> ...lots of previous logging...
> enter getNextWide
> hasNext = true
> set key = dVNhbXAxMzQ3ODM1OA%3D%3D
> lastRow != null
> added 1 items to bag from lastRow
> added 1 items to bag from row
> hasNext = true
> added 1 items to bag from row
> hasNext = true
> added 1 items to bag from row
> hasNext = true
> added 1 items to bag from row
> hasNext = true
> added 1 items to bag from row
> hasNext = true
> added 1 items to bag from row
> hasNext = true
> added 1 items to bag from row
> hasNext = true
> added 1 items to bag from row
> hasNext = true
> key changed, new key = 669392df09572d0045b964bc65f86a2c
> exit getNextWide
> enter getNextWide
> hasNext = true
> //!!!THIS IS THE PROBLEM HERE I THINK!!!
> //!!!Usually the key here == key before "exit getNextWide"!!!
> set key = 5f900ee4bb1850f8cf387cc3d5fc23ca
> //!!! lastRow is data for 669392df09572d0045b964bc65f86a2c !!!
> //!!! but it's being added to key 5f900ee4bb1850f8cf387cc3d5fc23ca !!!
> lastRow != null
> added 1 items to bag from lastRow
> //!!! Here are the real values for 5f900ee4bb1850f8cf387cc3d5fc23ca !!!
> added 1 items to bag from row
> hasNext = true
> added 1 items to bag from row
> hasNext = true
> key changed, new key = 50438549-cdb6-8c44-f93a-d18d7daeffd8
> exit getNextWide
> enter getNextWide
> hasNext = true
> set key = 50438549-cdb6-8c44-f93a-d18d7daeffd8
--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira