[jira] [Commented] (CASSANDRA-13075) Indexer is not correctly invoked when building indexes over sstables
[ https://issues.apache.org/jira/browse/CASSANDRA-13075?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15830158#comment-15830158 ] Sergio Bossa commented on CASSANDRA-13075: -- +1 > Indexer is not correctly invoked when building indexes over sstables > > > Key: CASSANDRA-13075 > URL: https://issues.apache.org/jira/browse/CASSANDRA-13075 > Project: Cassandra > Issue Type: Bug >Reporter: Sergio Bossa >Assignee: Alex Petrov >Priority: Critical > Attachments: CustomIndexTest.java > > > Following CASSANDRA-12796, {{SecondaryIndexManager#indexPartition()}} calls > each {{Indexer}} {{begin}} and {{finish}} methods multiple times per > partition (depending on the page size), as > {{PartitionIterators#getOnlyElement()}} returns an empty partition even when > the iterator is exhausted. > This leads to bugs for {{Indexer}} implementations doing actual work in those > methods, but even worse, it provides the {{Indexer}} the same input of an > empty partition containing only a non-live partition deletion, as the > {{Indexer#partitionDelete()}} method is *not* actually called. > My proposed solution: > 1) Stop the iteration before the empty partition is returned and ingested > into the {{Indexer}}. > 2) Actually call the {{Indexer#partitionDelete()}} method inside > {{SecondaryIndexManager#indexPartition()}} (which requires to use a filtered > iterator so it actually contains the deletion info). -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (CASSANDRA-13075) Indexer is not correctly invoked when building indexes over sstables
[ https://issues.apache.org/jira/browse/CASSANDRA-13075?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15829995#comment-15829995 ] Alex Petrov commented on CASSANDRA-13075: - I've updated patches for all branches and re-ran CI. Unfortunately, the failures are also present on trunk (and/or corresponding branch), so they're not related to this patch (however situation with tests doesn't make visibility better). |[3.0|https://github.com/ifesdjeen/cassandra/tree/13075-3.0]|[utest|https://cassci.datastax.com/view/Dev/view/ifesdjeen/job/ifesdjeen-13075-3.0-testall/]|[dtest|https://cassci.datastax.com/view/Dev/view/ifesdjeen/job/ifesdjeen-13075-3.0-dtest/]| |[3.X|https://github.com/ifesdjeen/cassandra/tree/13075-3.X]|[utest|https://cassci.datastax.com/view/Dev/view/ifesdjeen/job/ifesdjeen-13075-3.X-testall/]|[dtest|https://cassci.datastax.com/view/Dev/view/ifesdjeen/job/ifesdjeen-13075-3.X-dtest/]| |[3.11|https://github.com/ifesdjeen/cassandra/tree/13075-3.11]|[utest|https://cassci.datastax.com/view/Dev/view/ifesdjeen/job/ifesdjeen-13075-3.11-testall/]|[dtest|https://cassci.datastax.com/view/Dev/view/ifesdjeen/job/ifesdjeen-13075-3.11-dtest/]| |[trunk|https://github.com/ifesdjeen/cassandra/tree/13075-trunk]|[utest|https://cassci.datastax.com/view/Dev/view/ifesdjeen/job/ifesdjeen-13075-trunk-testall/]|[dtest|https://cassci.datastax.com/view/Dev/view/ifesdjeen/job/ifesdjeen-13075-trunk-dtest/]| > Indexer is not correctly invoked when building indexes over sstables > > > Key: CASSANDRA-13075 > URL: https://issues.apache.org/jira/browse/CASSANDRA-13075 > Project: Cassandra > Issue Type: Bug >Reporter: Sergio Bossa >Assignee: Alex Petrov >Priority: Critical > Attachments: CustomIndexTest.java > > > Following CASSANDRA-12796, {{SecondaryIndexManager#indexPartition()}} calls > each {{Indexer}} {{begin}} and {{finish}} methods multiple times per > partition (depending on the page size), as > {{PartitionIterators#getOnlyElement()}} returns an empty partition even when > the iterator is exhausted. > This leads to bugs for {{Indexer}} implementations doing actual work in those > methods, but even worse, it provides the {{Indexer}} the same input of an > empty partition containing only a non-live partition deletion, as the > {{Indexer#partitionDelete()}} method is *not* actually called. > My proposed solution: > 1) Stop the iteration before the empty partition is returned and ingested > into the {{Indexer}}. > 2) Actually call the {{Indexer#partitionDelete()}} method inside > {{SecondaryIndexManager#indexPartition()}} (which requires to use a filtered > iterator so it actually contains the deletion info). -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (CASSANDRA-13075) Indexer is not correctly invoked when building indexes over sstables
[ https://issues.apache.org/jira/browse/CASSANDRA-13075?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15828459#comment-15828459 ] Alex Petrov commented on CASSANDRA-13075: - bq. it is missing a test with range tombstones crossing pages. I've tested it as far as I could with {{1}} to {{5}} page sizes, and multiple tombstones, although didn't have a chance to dig in and write an explanation on why it works the way it does. We're using {{CQLCounter}} for counting, and it counts only live rows, so range tombstone markers are guaranteed to be read together on the same page. You can refer to [this file|https://github.com/apache/cassandra/blob/trunk/src/java/org/apache/cassandra/db/filter/DataLimits.java#L442] for more info. The test with range tombstones checks if we fetch them correctly disregarding the page size. So if we have: {code} [RT open marker] [RT close marker] | [ row ] | [RT open marker] [RT close marker] {code} So if we do paging, even with a page size 1, we'll get all three of them on same page, since paging would count only the row. I'll post an updated patch with two other nits fixed for all branches. > Indexer is not correctly invoked when building indexes over sstables > > > Key: CASSANDRA-13075 > URL: https://issues.apache.org/jira/browse/CASSANDRA-13075 > Project: Cassandra > Issue Type: Bug >Reporter: Sergio Bossa >Assignee: Alex Petrov >Priority: Critical > Attachments: CustomIndexTest.java > > > Following CASSANDRA-12796, {{SecondaryIndexManager#indexPartition()}} calls > each {{Indexer}} {{begin}} and {{finish}} methods multiple times per > partition (depending on the page size), as > {{PartitionIterators#getOnlyElement()}} returns an empty partition even when > the iterator is exhausted. > This leads to bugs for {{Indexer}} implementations doing actual work in those > methods, but even worse, it provides the {{Indexer}} the same input of an > empty partition containing only a non-live partition deletion, as the > {{Indexer#partitionDelete()}} method is *not* actually called. > My proposed solution: > 1) Stop the iteration before the empty partition is returned and ingested > into the {{Indexer}}. > 2) Actually call the {{Indexer#partitionDelete()}} method inside > {{SecondaryIndexManager#indexPartition()}} (which requires to use a filtered > iterator so it actually contains the deletion info). -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (CASSANDRA-13075) Indexer is not correctly invoked when building indexes over sstables
[ https://issues.apache.org/jira/browse/CASSANDRA-13075?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15828347#comment-15828347 ] Sergio Bossa commented on CASSANDRA-13075: -- Looks better now [~ifesdjeen]! Just a few more points: 1) As you mentioned, it is missing a test with range tombstones crossing pages. 2) [This|https://github.com/ifesdjeen/cassandra/commit/7aa0c7c41960ed11b35545c49ca7e8f869f76e2f#diff-13cb97758bcb11cce8fc6f4cb1990dd6R729] should use {{pageSize}}. 3) [This|https://github.com/ifesdjeen/cassandra/commit/7aa0c7c41960ed11b35545c49ca7e8f869f76e2f#diff-13cb97758bcb11cce8fc6f4cb1990dd6R785] should be probably replaced with {{StubIndex}} (I've used that other one just as quick hack). > Indexer is not correctly invoked when building indexes over sstables > > > Key: CASSANDRA-13075 > URL: https://issues.apache.org/jira/browse/CASSANDRA-13075 > Project: Cassandra > Issue Type: Bug >Reporter: Sergio Bossa >Assignee: Alex Petrov >Priority: Critical > Attachments: CustomIndexTest.java > > > Following CASSANDRA-12796, {{SecondaryIndexManager#indexPartition()}} calls > each {{Indexer}} {{begin}} and {{finish}} methods multiple times per > partition (depending on the page size), as > {{PartitionIterators#getOnlyElement()}} returns an empty partition even when > the iterator is exhausted. > This leads to bugs for {{Indexer}} implementations doing actual work in those > methods, but even worse, it provides the {{Indexer}} the same input of an > empty partition containing only a non-live partition deletion, as the > {{Indexer#partitionDelete()}} method is *not* actually called. > My proposed solution: > 1) Stop the iteration before the empty partition is returned and ingested > into the {{Indexer}}. > 2) Actually call the {{Indexer#partitionDelete()}} method inside > {{SecondaryIndexManager#indexPartition()}} (which requires to use a filtered > iterator so it actually contains the deletion info). -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (CASSANDRA-13075) Indexer is not correctly invoked when building indexes over sstables
[ https://issues.apache.org/jira/browse/CASSANDRA-13075?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15826571#comment-15826571 ] Alex Petrov commented on CASSANDRA-13075: - > Processing the collected range tombstones You're right. I have overlooked the tombstones processing. It was "functioning" correctly, although not passing elements where I expected them. I remember stepping into it with a debugger, so I made the wrong assumption. I've added proper tests for it now. I'll check the paging tomorrow once again to make sure range tombstones won't span across pages. This also allowed to address the other two points (additional begin/finish calls along with passing correct rows). The last point appears to be a github highlighting glitch, since that's all I have changed in my diff view locally: {code} private final DataLimits pageLimits; -private final DataLimits.Counter counter; +protected final DataLimits.Counter counter; private DecoratedKey currentKey; {code} > Indexer is not correctly invoked when building indexes over sstables > > > Key: CASSANDRA-13075 > URL: https://issues.apache.org/jira/browse/CASSANDRA-13075 > Project: Cassandra > Issue Type: Bug >Reporter: Sergio Bossa >Assignee: Alex Petrov >Priority: Critical > Attachments: CustomIndexTest.java > > > Following CASSANDRA-12796, {{SecondaryIndexManager#indexPartition()}} calls > each {{Indexer}} {{begin}} and {{finish}} methods multiple times per > partition (depending on the page size), as > {{PartitionIterators#getOnlyElement()}} returns an empty partition even when > the iterator is exhausted. > This leads to bugs for {{Indexer}} implementations doing actual work in those > methods, but even worse, it provides the {{Indexer}} the same input of an > empty partition containing only a non-live partition deletion, as the > {{Indexer#partitionDelete()}} method is *not* actually called. > My proposed solution: > 1) Stop the iteration before the empty partition is returned and ingested > into the {{Indexer}}. > 2) Actually call the {{Indexer#partitionDelete()}} method inside > {{SecondaryIndexManager#indexPartition()}} (which requires to use a filtered > iterator so it actually contains the deletion info). -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (CASSANDRA-13075) Indexer is not correctly invoked when building indexes over sstables
[ https://issues.apache.org/jira/browse/CASSANDRA-13075?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15826423#comment-15826423 ] Sergio Bossa commented on CASSANDRA-13075: -- [~ifesdjeen], I believe there are a few problems with your patch: * [Passing|https://github.com/ifesdjeen/cassandra/commit/803d3c6ea6e6852103204c7d0bb46001be191ab2#diff-3f2c8994c4ff8748c3faf7e70958520dR550] the partition columns for the metadata definition is subtly different than passing the columns for the read partition, as the latter could be a subset of the former (at least according to javadocs and a brief code inspection); I'm not sure how much it matters in practice, but that could potentially lead to unwanted index calls. * [Processing|https://github.com/ifesdjeen/cassandra/commit/803d3c6ea6e6852103204c7d0bb46001be191ab2#diff-3f2c8994c4ff8748c3faf7e70958520dR567] the collected range tombstones inside the loop, piggybacking on the fact the loop itself will do one more iteration after exhaustion, doesn't actually work, that is, it seems range tombstones are never processed; this can be easily verified by checking range tombstones [here|https://github.com/ifesdjeen/cassandra/commit/803d3c6ea6e6852103204c7d0bb46001be191ab2#diff-13cb97758bcb11cce8fc6f4cb1990dd6R754] (as this test _should_ check for range tombstones, unless I'm missing something?). * Even if the last point worked, I believe that would lead to a duplicate {{begin}} and {{finish}} call for range tombstones. * [Unused|https://github.com/ifesdjeen/cassandra/commit/803d3c6ea6e6852103204c7d0bb46001be191ab2#diff-71513a85468b7cbc97f1e820b06a20a8R125]? > Indexer is not correctly invoked when building indexes over sstables > > > Key: CASSANDRA-13075 > URL: https://issues.apache.org/jira/browse/CASSANDRA-13075 > Project: Cassandra > Issue Type: Bug >Reporter: Sergio Bossa >Assignee: Alex Petrov >Priority: Critical > Attachments: CustomIndexTest.java > > > Following CASSANDRA-12796, {{SecondaryIndexManager#indexPartition()}} calls > each {{Indexer}} {{begin}} and {{finish}} methods multiple times per > partition (depending on the page size), as > {{PartitionIterators#getOnlyElement()}} returns an empty partition even when > the iterator is exhausted. > This leads to bugs for {{Indexer}} implementations doing actual work in those > methods, but even worse, it provides the {{Indexer}} the same input of an > empty partition containing only a non-live partition deletion, as the > {{Indexer#partitionDelete()}} method is *not* actually called. > My proposed solution: > 1) Stop the iteration before the empty partition is returned and ingested > into the {{Indexer}}. > 2) Actually call the {{Indexer#partitionDelete()}} method inside > {{SecondaryIndexManager#indexPartition()}} (which requires to use a filtered > iterator so it actually contains the deletion info). -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (CASSANDRA-13075) Indexer is not correctly invoked when building indexes over sstables
[ https://issues.apache.org/jira/browse/CASSANDRA-13075?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15825704#comment-15825704 ] Alex Petrov commented on CASSANDRA-13075: - I've prepared a patch which would address both issues you have described: * we use unfiltered pager now in order to allow seeing partition deletions and therefore notifying the client about the fact that partition was deleted * problem with empty extra page is addressed. It was surfacing itself only when the page size was exactly size of partition. In this case, it tried to fetch another page, was receiving an empty partition in return. We skip this corner case here. In other places it's used it doesn't seem to be a problem Also, it will help to handle range tombstones correctly for indexes as well. |[3.X|https://github.com/ifesdjeen/cassandra/tree/13075-3.X]|[utest|https://cassci.datastax.com/view/Dev/view/ifesdjeen/job/ifesdjeen-13075-3.X-testall/]|[dtest|https://cassci.datastax.com/view/Dev/view/ifesdjeen/job/ifesdjeen-13075-3.X-dtest/]| |[3.0|https://github.com/ifesdjeen/cassandra/tree/13075-3.0]|[utest|https://cassci.datastax.com/view/Dev/view/ifesdjeen/job/ifesdjeen-13075-3.0-testall/]|[dtest|https://cassci.datastax.com/view/Dev/view/ifesdjeen/job/ifesdjeen-13075-3.0-dtest/]| |[trunk|https://github.com/ifesdjeen/cassandra/tree/13075-trunk]|[utest|https://cassci.datastax.com/view/Dev/view/ifesdjeen/job/ifesdjeen-13075-trunk-testall/]|[dtest|https://cassci.datastax.com/view/Dev/view/ifesdjeen/job/ifesdjeen-13075-trunk-dtest/]| > Indexer is not correctly invoked when building indexes over sstables > > > Key: CASSANDRA-13075 > URL: https://issues.apache.org/jira/browse/CASSANDRA-13075 > Project: Cassandra > Issue Type: Bug >Reporter: Sergio Bossa >Assignee: Alex Petrov >Priority: Critical > Attachments: CustomIndexTest.java > > > Following CASSANDRA-12796, {{SecondaryIndexManager#indexPartition()}} calls > each {{Indexer}} {{begin}} and {{finish}} methods multiple times per > partition (depending on the page size), as > {{PartitionIterators#getOnlyElement()}} returns an empty partition even when > the iterator is exhausted. > This leads to bugs for {{Indexer}} implementations doing actual work in those > methods, but even worse, it provides the {{Indexer}} the same input of an > empty partition containing only a non-live partition deletion, as the > {{Indexer#partitionDelete()}} method is *not* actually called. > My proposed solution: > 1) Stop the iteration before the empty partition is returned and ingested > into the {{Indexer}}. > 2) Actually call the {{Indexer#partitionDelete()}} method inside > {{SecondaryIndexManager#indexPartition()}} (which requires to use a filtered > iterator so it actually contains the deletion info). -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (CASSANDRA-13075) Indexer is not correctly invoked when building indexes over sstables
[ https://issues.apache.org/jira/browse/CASSANDRA-13075?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15811796#comment-15811796 ] Alex Petrov commented on CASSANDRA-13075: - Yes, sorry, I understood that although didn't have a chance to post my findings here. I've written a simple test that would test different partitions (with/without static rows, different page size: more/less than items in partition), but was unable to reproduce the issue (that we keep calling {{begin}} / {{end}} even after the paging iterator has been exhausted. Could you help me to reproduce it? It might be looking like it, especially if you check for whether or not it was exhausted before actual paging took place. You'd need to check after the `try` block to be sure. {{partitionDelete}} isn't called, you're absolutely right here, and we should address this issue. Do you see these things as two separate issues? > Indexer is not correctly invoked when building indexes over sstables > > > Key: CASSANDRA-13075 > URL: https://issues.apache.org/jira/browse/CASSANDRA-13075 > Project: Cassandra > Issue Type: Bug >Reporter: Sergio Bossa >Assignee: Alex Petrov >Priority: Critical > > Following CASSANDRA-12796, {{SecondaryIndexManager#indexPartition()}} calls > each {{Indexer}} {{begin}} and {{finish}} methods multiple times per > partition (depending on the page size), as > {{PartitionIterators#getOnlyElement()}} returns an empty partition even when > the iterator is exhausted. > This leads to bugs for {{Indexer}} implementations doing actual work in those > methods, but even worse, it provides the {{Indexer}} the same input of an > empty partition containing only a non-live partition deletion, as the > {{Indexer#partitionDelete()}} method is *not* actually called. > My proposed solution: > 1) Stop the iteration before the empty partition is returned and ingested > into the {{Indexer}}. > 2) Actually call the {{Indexer#partitionDelete()}} method inside > {{SecondaryIndexManager#indexPartition()}} (which requires to use a filtered > iterator so it actually contains the deletion info). -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (CASSANDRA-13075) Indexer is not correctly invoked when building indexes over sstables
[ https://issues.apache.org/jira/browse/CASSANDRA-13075?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15811740#comment-15811740 ] Sergio Bossa commented on CASSANDRA-13075: -- [~ifesdjeen], my description above might have been misleading, as I do not think the actual problem is calling {{start}} and {{finish}} just once: as long as they are called together, it will be fine, even if they're called multiple times for the same partition *but different subsets of rows*. The real problem to me is they are called an additional time even when the partition is exhausted. The solution you proposed could work on paper in terms of avoiding those methods being called multiple times (at all), even if practically speaking could be troublesome, but only for the first part of the problem: what about calling {{Indexer#partitionDelete()}}? > Indexer is not correctly invoked when building indexes over sstables > > > Key: CASSANDRA-13075 > URL: https://issues.apache.org/jira/browse/CASSANDRA-13075 > Project: Cassandra > Issue Type: Bug >Reporter: Sergio Bossa >Assignee: Alex Petrov >Priority: Critical > > Following CASSANDRA-12796, {{SecondaryIndexManager#indexPartition()}} calls > each {{Indexer}} {{begin}} and {{finish}} methods multiple times per > partition (depending on the page size), as > {{PartitionIterators#getOnlyElement()}} returns an empty partition even when > the iterator is exhausted. > This leads to bugs for {{Indexer}} implementations doing actual work in those > methods, but even worse, it provides the {{Indexer}} the same input of an > empty partition containing only a non-live partition deletion, as the > {{Indexer#partitionDelete()}} method is *not* actually called. > My proposed solution: > 1) Stop the iteration before the empty partition is returned and ingested > into the {{Indexer}}. > 2) Actually call the {{Indexer#partitionDelete()}} method inside > {{SecondaryIndexManager#indexPartition()}} (which requires to use a filtered > iterator so it actually contains the deletion info). -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (CASSANDRA-13075) Indexer is not correctly invoked when building indexes over sstables
[ https://issues.apache.org/jira/browse/CASSANDRA-13075?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15794760#comment-15794760 ] Alex Petrov commented on CASSANDRA-13075: - Good find! I might be misunderstanding the issue, but as far as I can say, we have multiple alternatives, if what we need is ensuring that {{Index.Indexer::start}} and {{Index.Indexer::finish}} are called just once per partition: * piggyback the {{readStatic}} boolean, that makes sure we index the static row just once * call them outside of the loop (since essentially we have both {{partitionColumns}} and {{writeGroup}} available before we have a partition page, so it's optional I may have misunderstood the part about {{PartitionIterators.getOnlyElement}}, but it also seems to me that this behaviour will be just fine if we take the {{start}} and {{finish}} out of the loop, since {{insertRow}} for statics will be skipped on further iterations and for partition rows it will be also skipped since partition is empty on exhausted iterator.. > Indexer is not correctly invoked when building indexes over sstables > > > Key: CASSANDRA-13075 > URL: https://issues.apache.org/jira/browse/CASSANDRA-13075 > Project: Cassandra > Issue Type: Bug >Reporter: Sergio Bossa >Assignee: Alex Petrov >Priority: Critical > > Following CASSANDRA-12796, {{SecondaryIndexManager#indexPartition()}} calls > each {{Indexer}} {{begin}} and {{finish}} methods multiple times per > partition (depending on the page size), as > {{PartitionIterators#getOnlyElement()}} returns an empty partition even when > the iterator is exhausted. > This leads to bugs for {{Indexer}} implementations doing actual work in those > methods, but even worse, it provides the {{Indexer}} the same input of an > empty partition containing only a non-live partition deletion, as the > {{Indexer#partitionDelete()}} method is *not* actually called. > My proposed solution: > 1) Stop the iteration before the empty partition is returned and ingested > into the {{Indexer}}. > 2) Actually call the {{Indexer#partitionDelete()}} method inside > {{SecondaryIndexManager#indexPartition()}} (which requires to use a filtered > iterator so it actually contains the deletion info). -- This message was sent by Atlassian JIRA (v6.3.4#6332)