[jira] [Commented] (CASSANDRA-5397) Updates to PerRowSecondaryIndex don't use most current values

2013-04-05 Thread Marcus Eriksson (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-5397?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13623467#comment-13623467
 ] 

Marcus Eriksson commented on CASSANDRA-5397:


pushed a build fix to trunk (basically only fixed the test case): 
6afbed371c0d12a15a969e4f52ba670998bab282 RowMutations do not take QueryPath in 
trunk

 Updates to PerRowSecondaryIndex don't use most current values 
 --

 Key: CASSANDRA-5397
 URL: https://issues.apache.org/jira/browse/CASSANDRA-5397
 Project: Cassandra
  Issue Type: Bug
Affects Versions: 1.2.3
Reporter: Sam Tunnicliffe
Assignee: Sam Tunnicliffe
Priority: Minor
 Fix For: 1.2.4

 Attachments: 5397_12.txt, 5397-1.2-v3.txt, 5397-1.2-v4.txt, 
 5397_trunk.txt, 5397.txt


 The way that updates to secondary indexes are performed using  
 SecondaryIndexManager.Updater is flawed for PerRowSecondaryIndexes.  Unlike 
 PerColumnSecondaryIndexes, which only require the old  new values for a 
 single column,  the expectation is that a PerRow indexer can be given just a 
 key which it will use to retrieve the entire row (or as many columns as it 
 requires) and perform its indexing on those columns.  As the indexes are 
 updated before the memtable atomic swap occurs, a per-row indexer may only 
 read the previous values for the row, not the new ones that are being 
 written. In the case of an insert, there is no previous value and so nothing 
 is added to the index.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (CASSANDRA-5397) Updates to PerRowSecondaryIndex don't use most current values

2013-04-04 Thread Sam Tunnicliffe (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-5397?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13621969#comment-13621969
 ] 

Sam Tunnicliffe commented on CASSANDRA-5397:


The patch was originally against 1.2, but it needed rebasing after 
CASSANDRA-5395 was committed. I'm attaching 2 new versions, one each for 1.2 
and trunk.

 Updates to PerRowSecondaryIndex don't use most current values 
 --

 Key: CASSANDRA-5397
 URL: https://issues.apache.org/jira/browse/CASSANDRA-5397
 Project: Cassandra
  Issue Type: Bug
Affects Versions: 1.2.3
Reporter: Sam Tunnicliffe
Assignee: Sam Tunnicliffe
Priority: Minor
 Attachments: 5397_12.txt, 5397_trunk.txt, 5397.txt


 The way that updates to secondary indexes are performed using  
 SecondaryIndexManager.Updater is flawed for PerRowSecondaryIndexes.  Unlike 
 PerColumnSecondaryIndexes, which only require the old  new values for a 
 single column,  the expectation is that a PerRow indexer can be given just a 
 key which it will use to retrieve the entire row (or as many columns as it 
 requires) and perform its indexing on those columns.  As the indexes are 
 updated before the memtable atomic swap occurs, a per-row indexer may only 
 read the previous values for the row, not the new ones that are being 
 written. In the case of an insert, there is no previous value and so nothing 
 is added to the index.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (CASSANDRA-5397) Updates to PerRowSecondaryIndex don't use most current values

2013-04-04 Thread Jonathan Ellis (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-5397?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13622447#comment-13622447
 ] 

Jonathan Ellis commented on CASSANDRA-5397:
---

Also, remove is only called by compaction, so there will be no commit (so 
adding to deferred is a bad idea).

If we're assuming that PRSI always keeps the index exactly up to date, remove 
can be a no-op.

 Updates to PerRowSecondaryIndex don't use most current values 
 --

 Key: CASSANDRA-5397
 URL: https://issues.apache.org/jira/browse/CASSANDRA-5397
 Project: Cassandra
  Issue Type: Bug
Affects Versions: 1.2.3
Reporter: Sam Tunnicliffe
Assignee: Sam Tunnicliffe
Priority: Minor
 Attachments: 5397_12.txt, 5397-1.2-v3.txt, 5397_trunk.txt, 5397.txt


 The way that updates to secondary indexes are performed using  
 SecondaryIndexManager.Updater is flawed for PerRowSecondaryIndexes.  Unlike 
 PerColumnSecondaryIndexes, which only require the old  new values for a 
 single column,  the expectation is that a PerRow indexer can be given just a 
 key which it will use to retrieve the entire row (or as many columns as it 
 requires) and perform its indexing on those columns.  As the indexes are 
 updated before the memtable atomic swap occurs, a per-row indexer may only 
 read the previous values for the row, not the new ones that are being 
 written. In the case of an insert, there is no previous value and so nothing 
 is added to the index.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (CASSANDRA-5397) Updates to PerRowSecondaryIndex don't use most current values

2013-04-04 Thread Sam Tunnicliffe (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-5397?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13622523#comment-13622523
 ] 

Sam Tunnicliffe commented on CASSANDRA-5397:


Yes, you're right about the {{if( column.isMarkedForDelete()) return }} being a 
regression.

Its down to the PRSI implementation to figure out whether an update is actually 
an update or whether it actually calls for a delete. As the PRSI only has the 
key to work with  is going to be inspecting the whole row anyway this 
shouldn't be difficult, but it does make the whole SI/PCSI/PRCI hierarchy a bit 
ugly. 

Also, we do/should assume that PRSI always keeps the index exactly up to date, 
so I'm +1 with making remove a no-op there.  

attached v4 for 1.2 (v3 + the no-op remove for PRSI)


 Updates to PerRowSecondaryIndex don't use most current values 
 --

 Key: CASSANDRA-5397
 URL: https://issues.apache.org/jira/browse/CASSANDRA-5397
 Project: Cassandra
  Issue Type: Bug
Affects Versions: 1.2.3
Reporter: Sam Tunnicliffe
Assignee: Sam Tunnicliffe
Priority: Minor
 Attachments: 5397_12.txt, 5397-1.2-v3.txt, 5397-1.2-v4.txt, 
 5397_trunk.txt, 5397.txt


 The way that updates to secondary indexes are performed using  
 SecondaryIndexManager.Updater is flawed for PerRowSecondaryIndexes.  Unlike 
 PerColumnSecondaryIndexes, which only require the old  new values for a 
 single column,  the expectation is that a PerRow indexer can be given just a 
 key which it will use to retrieve the entire row (or as many columns as it 
 requires) and perform its indexing on those columns.  As the indexes are 
 updated before the memtable atomic swap occurs, a per-row indexer may only 
 read the previous values for the row, not the new ones that are being 
 written. In the case of an insert, there is no previous value and so nothing 
 is added to the index.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (CASSANDRA-5397) Updates to PerRowSecondaryIndex don't use most current values

2013-04-04 Thread Jonathan Ellis (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-5397?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13622618#comment-13622618
 ] 

Jonathan Ellis commented on CASSANDRA-5397:
---

Odd, I'm seeing the following with v4:

{noformat}
formite:git johnathanellis$ patch -p1  ~/.JIRAClient/download/5397-1.2-v4.txt 
patching file src/java/org/apache/cassandra/db/AtomicSortedColumns.java
patching file src/java/org/apache/cassandra/db/index/SecondaryIndexManager.java
patch:  malformed patch at line 159: diff --git 
a/test/unit/org/apache/cassandra/SchemaLoader.java 
b/test/unit/org/apache/cassandra/SchemaLoader.java
{noformat}

I committed what I think is the same code based on v3, please doublecheck it.

 Updates to PerRowSecondaryIndex don't use most current values 
 --

 Key: CASSANDRA-5397
 URL: https://issues.apache.org/jira/browse/CASSANDRA-5397
 Project: Cassandra
  Issue Type: Bug
Affects Versions: 1.2.3
Reporter: Sam Tunnicliffe
Assignee: Sam Tunnicliffe
Priority: Minor
 Attachments: 5397_12.txt, 5397-1.2-v3.txt, 5397-1.2-v4.txt, 
 5397_trunk.txt, 5397.txt


 The way that updates to secondary indexes are performed using  
 SecondaryIndexManager.Updater is flawed for PerRowSecondaryIndexes.  Unlike 
 PerColumnSecondaryIndexes, which only require the old  new values for a 
 single column,  the expectation is that a PerRow indexer can be given just a 
 key which it will use to retrieve the entire row (or as many columns as it 
 requires) and perform its indexing on those columns.  As the indexes are 
 updated before the memtable atomic swap occurs, a per-row indexer may only 
 read the previous values for the row, not the new ones that are being 
 written. In the case of an insert, there is no previous value and so nothing 
 is added to the index.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (CASSANDRA-5397) Updates to PerRowSecondaryIndex don't use most current values

2013-04-04 Thread Sam Tunnicliffe (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-5397?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13622709#comment-13622709
 ] 

Sam Tunnicliffe commented on CASSANDRA-5397:


lgtm, thanks.

 Updates to PerRowSecondaryIndex don't use most current values 
 --

 Key: CASSANDRA-5397
 URL: https://issues.apache.org/jira/browse/CASSANDRA-5397
 Project: Cassandra
  Issue Type: Bug
Affects Versions: 1.2.3
Reporter: Sam Tunnicliffe
Assignee: Sam Tunnicliffe
Priority: Minor
 Fix For: 1.2.4

 Attachments: 5397_12.txt, 5397-1.2-v3.txt, 5397-1.2-v4.txt, 
 5397_trunk.txt, 5397.txt


 The way that updates to secondary indexes are performed using  
 SecondaryIndexManager.Updater is flawed for PerRowSecondaryIndexes.  Unlike 
 PerColumnSecondaryIndexes, which only require the old  new values for a 
 single column,  the expectation is that a PerRow indexer can be given just a 
 key which it will use to retrieve the entire row (or as many columns as it 
 requires) and perform its indexing on those columns.  As the indexes are 
 updated before the memtable atomic swap occurs, a per-row indexer may only 
 read the previous values for the row, not the new ones that are being 
 written. In the case of an insert, there is no previous value and so nothing 
 is added to the index.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (CASSANDRA-5397) Updates to PerRowSecondaryIndex don't use most current values

2013-04-04 Thread Jeremiah Jordan (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-5397?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13622737#comment-13622737
 ] 

Jeremiah Jordan commented on CASSANDRA-5397:


Does this effect 1.1?  Or is it a problem with the new faster 1.2 indexes?

 Updates to PerRowSecondaryIndex don't use most current values 
 --

 Key: CASSANDRA-5397
 URL: https://issues.apache.org/jira/browse/CASSANDRA-5397
 Project: Cassandra
  Issue Type: Bug
Affects Versions: 1.2.3
Reporter: Sam Tunnicliffe
Assignee: Sam Tunnicliffe
Priority: Minor
 Fix For: 1.2.4

 Attachments: 5397_12.txt, 5397-1.2-v3.txt, 5397-1.2-v4.txt, 
 5397_trunk.txt, 5397.txt


 The way that updates to secondary indexes are performed using  
 SecondaryIndexManager.Updater is flawed for PerRowSecondaryIndexes.  Unlike 
 PerColumnSecondaryIndexes, which only require the old  new values for a 
 single column,  the expectation is that a PerRow indexer can be given just a 
 key which it will use to retrieve the entire row (or as many columns as it 
 requires) and perform its indexing on those columns.  As the indexes are 
 updated before the memtable atomic swap occurs, a per-row indexer may only 
 read the previous values for the row, not the new ones that are being 
 written. In the case of an insert, there is no previous value and so nothing 
 is added to the index.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (CASSANDRA-5397) Updates to PerRowSecondaryIndex don't use most current values

2013-04-03 Thread Jonathan Ellis (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-5397?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13621283#comment-13621283
 ] 

Jonathan Ellis commented on CASSANDRA-5397:
---

Is this intended for 1.2 or 2.0?  I'm getting lots of conflicts on both.

 Updates to PerRowSecondaryIndex don't use most current values 
 --

 Key: CASSANDRA-5397
 URL: https://issues.apache.org/jira/browse/CASSANDRA-5397
 Project: Cassandra
  Issue Type: Bug
Affects Versions: 1.2.3
Reporter: Sam Tunnicliffe
Assignee: Sam Tunnicliffe
Priority: Minor
 Attachments: 5397.txt


 The way that updates to secondary indexes are performed using  
 SecondaryIndexManager.Updater is flawed for PerRowSecondaryIndexes.  Unlike 
 PerColumnSecondaryIndexes, which only require the old  new values for a 
 single column,  the expectation is that a PerRow indexer can be given just a 
 key which it will use to retrieve the entire row (or as many columns as it 
 requires) and perform its indexing on those columns.  As the indexes are 
 updated before the memtable atomic swap occurs, a per-row indexer may only 
 read the previous values for the row, not the new ones that are being 
 written. In the case of an insert, there is no previous value and so nothing 
 is added to the index.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (CASSANDRA-5397) Updates to PerRowSecondaryIndex don't use most current values

2013-04-02 Thread Sam Tunnicliffe (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-5397?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13619808#comment-13619808
 ] 

Sam Tunnicliffe commented on CASSANDRA-5397:


I don't really see how we can do that, especially as PRSI is typically 
implemented outside of C* and the contract we give it is expressed in the 
method signature {{public abstract void index(ByteBuffer rowKey);}} So the 
assumption is that an update to a PRSI will be able to access the entire row at 
index time. 

Changes to AtomicSortedColumns are applied a column at a time so 
MixedIndexUpdater has a guard to ensure that even when a mutation changes 
multiple columns in the row,  the index is only updated once. Obviously though, 
until the last of these column updates occurs, the row is not fully updated. So 
I see 2 options, defer the per-row indexing until we've finished updating the 
row (as in my first patch), or remove the guard and apply the per-row update as 
each column is updated. The second option has the benefit of not changing the 
SIM.Updater api, but is potentially very inefficient.

 Updates to PerRowSecondaryIndex don't use most current values 
 --

 Key: CASSANDRA-5397
 URL: https://issues.apache.org/jira/browse/CASSANDRA-5397
 Project: Cassandra
  Issue Type: Bug
Affects Versions: 1.2.3
Reporter: Sam Tunnicliffe
Assignee: Sam Tunnicliffe
Priority: Minor
 Attachments: 5397.txt


 The way that updates to secondary indexes are performed using  
 SecondaryIndexManager.Updater is flawed for PerRowSecondaryIndexes.  Unlike 
 PerColumnSecondaryIndexes, which only require the old  new values for a 
 single column,  the expectation is that a PerRow indexer can be given just a 
 key which it will use to retrieve the entire row (or as many columns as it 
 requires) and perform its indexing on those columns.  As the indexes are 
 updated before the memtable atomic swap occurs, a per-row indexer may only 
 read the previous values for the row, not the new ones that are being 
 written. In the case of an insert, there is no previous value and so nothing 
 is added to the index.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (CASSANDRA-5397) Updates to PerRowSecondaryIndex don't use most current values

2013-04-02 Thread Jonathan Ellis (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-5397?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13620266#comment-13620266
 ] 

Jonathan Ellis commented on CASSANDRA-5397:
---

What guard are you talking about?  Have things changed since CASSANDRA-4458 was 
opened?

 Updates to PerRowSecondaryIndex don't use most current values 
 --

 Key: CASSANDRA-5397
 URL: https://issues.apache.org/jira/browse/CASSANDRA-5397
 Project: Cassandra
  Issue Type: Bug
Affects Versions: 1.2.3
Reporter: Sam Tunnicliffe
Assignee: Sam Tunnicliffe
Priority: Minor
 Attachments: 5397.txt


 The way that updates to secondary indexes are performed using  
 SecondaryIndexManager.Updater is flawed for PerRowSecondaryIndexes.  Unlike 
 PerColumnSecondaryIndexes, which only require the old  new values for a 
 single column,  the expectation is that a PerRow indexer can be given just a 
 key which it will use to retrieve the entire row (or as many columns as it 
 requires) and perform its indexing on those columns.  As the indexes are 
 updated before the memtable atomic swap occurs, a per-row indexer may only 
 read the previous values for the row, not the new ones that are being 
 written. In the case of an insert, there is no previous value and so nothing 
 is added to the index.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (CASSANDRA-5397) Updates to PerRowSecondaryIndex don't use most current values

2013-04-01 Thread Jonathan Ellis (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-5397?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13618937#comment-13618937
 ] 

Jonathan Ellis commented on CASSANDRA-5397:
---

TBH my preferred fix here would be to make PerRowSI also do lazy updates a la 
CASSANDRA-2897.  Is that possible?

 Updates to PerRowSecondaryIndex don't use most current values 
 --

 Key: CASSANDRA-5397
 URL: https://issues.apache.org/jira/browse/CASSANDRA-5397
 Project: Cassandra
  Issue Type: Bug
Affects Versions: 1.2.3
Reporter: Sam Tunnicliffe
Assignee: Sam Tunnicliffe
Priority: Minor
 Attachments: 5397.txt


 The way that updates to secondary indexes are performed using  
 SecondaryIndexManager.Updater is flawed for PerRowSecondaryIndexes.  Unlike 
 PerColumnSecondaryIndexes, which only require the old  new values for a 
 single column,  the expectation is that a PerRow indexer can be given just a 
 key which it will use to retrieve the entire row (or as many columns as it 
 requires) and perform its indexing on those columns.  As the indexes are 
 updated before the memtable atomic swap occurs, a per-row indexer may only 
 read the previous values for the row, not the new ones that are being 
 written. In the case of an insert, there is no previous value and so nothing 
 is added to the index.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira