[jira] [Comment Edited] (CASSANDRA-19495) Hints not stored after node goes down for the second time

Stefan Miklosovic (Jira) Wed, 27 Mar 2024 03:56:13 -0700


    [ 
https://issues.apache.org/jira/browse/CASSANDRA-19495?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17831292#comment-17831292
 ]


Stefan Miklosovic edited comment on CASSANDRA-19495 at 3/27/24 10:55 AM:
-------------------------------------------------------------------------

I am not sure where the problem is yet but I go through that 
{noformat}
private void flush(Iterator<ByteBuffer> iterator, HintsStore store, HintsBuffer 
buffer) {noformat}
method and check it with me closely to verify my logic here:
 
{noformat}
while (true)
{
    if (iterator.hasNext())
        flushInternal(iterator, store);

    if (!iterator.hasNext())
        break; {noformat}
So, we are in a while loop, we check if the iterator has some elements in it 
and if it does, it will flush it to disk, we see that in flushInternal there is:
{noformat}
try (HintsWriter.Session session = writer.newSession(writeBuffer))
{
    while (iterator.hasNext())
    {
        session.append(iterator.next());
        if (session.position() >= maxHintsFileSize)
            break;
    }
} {noformat}
So again, while we have some entries, we append, then we check if overflowed 
and if we did, we break.

That will close the session and it will return to the original while(true) 
where it gets to the second if
{noformat}
if (!iterator.hasNext())
    break; {noformat}
here it breaks the loop when we wrote all buffers in flushInternal and we have 
nothing to flush anymore.

OK but imagine that we are ever going to append by iterating over the iterator 
which contains just ONE ByteBuffer.

So now the flow is: 

if iterator has next (which it does), go to flushInternal, there it goes into 
while (iterator.hasNext()) (which it does), we just ask again, append that to 
session and check if the position is bigger than maxHintsFileSize. Now if it is 
- if we violated the threshold, we break hence we return from flushInternal. 

But when it breaks, it comes to this again:
{noformat}
if (!iterator.hasNext())
    break; {noformat}
But this will break right? Because if our iterator consisted of one buffer 
only, then it flushed it in flushInternal, but here that iterator will not have 
any new buffers anymore, so it will break. 

The result here is that we successfully wrote a buffer into a hint file where 
we reached maxHintsFileSize, but we entered break, so we never called 
store.closeWriter() and never reached "finally".

In other words, if iterator contains only one buffer, then it will always 
succeed to flush it, but we will never close the store. If we keep writing just 
one buffer every single time, then the hint file will grow beyond any limit? I 
do not see anything yet to contradict this.


was (Author: smiklosovic):
I am not sure where the problem is yet but I go through that 
{noformat}
private void flush(Iterator<ByteBuffer> iterator, HintsStore store, HintsBuffer 
buffer) {noformat}
method and check it with me closely to verify my logic here:
 
{noformat}
while (true)
{
    if (iterator.hasNext())
        flushInternal(iterator, store);

    if (!iterator.hasNext())
        break; {noformat}
So, we are in a while loop, we check if the iterator has some elements in it 
and if it does, it will flush it to disk, we see that in flushInternal there is:
{noformat}
try (HintsWriter.Session session = writer.newSession(writeBuffer))
{
    while (iterator.hasNext())
    {
        session.append(iterator.next());
        if (session.position() >= maxHintsFileSize)
            break;
    }
} {noformat}
So again, while we have some entries, we append, then we check if overflowed 
and if we did, we break.

That will close the session and it will return to the original while(true) 
where it gets to the second if
{noformat}
if (!iterator.hasNext())
    break; {noformat}
here it breaks the loop when we wrote all buffers in flushInternal and we have 
nothing to flush anymore.

OK but imagine that we are ever going to append by iterating over the iterator 
which contains just ONE ByteBuffer.

So now the flow is: 

if iterator has next (which it does), go to flushInternal, there it goes into 
while (iterator.hasNext()) (which it does), we just ask again, append that to 
session and check if the position is bigger than maxHintsFileSize. Now if it is 
- if we violated the threshold, we break hence we return from flushInternal. 

But when it breaks, it comes to this again:
{noformat}
if (!iterator.hasNext())
    break; {noformat}
But this will never happen, right? Because if our iterator consisted of one 
buffer only, then it flushed it in flushInternal, but here that iterator will 
not have any new buffers anymore, so it will break. 

The result here is that we successfully wrote a buffer into a hint file where 
we reached maxHintsFileSize, but we entered break, so we never called 
store.closeWriter() and never reached "finally".

In other words, if iterator contains only one buffer, then it will always 
succeed to flush it, but we will never close the store. If we keep writing just 
one buffer every single time, then the hint file will grow beyond any limit? I 
do not see anything yet to contradict this.

> Hints not stored after node goes down for the second time
> ---------------------------------------------------------
>
>                 Key: CASSANDRA-19495
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-19495
>             Project: Cassandra
>          Issue Type: Bug
>          Components: Consistency/Hints
>            Reporter: Paul Chandler
>            Assignee: Brandon Williams
>            Priority: Normal
>             Fix For: 4.1.x, 5.0.x, 5.x
>
>
> I have a scenario where a node goes down, hints are recorded on the second 
> node and replayed, as expected. If the first node goes down for a second time 
> and time span between the first time it stopped and the second time it 
> stopped is more than the max_hint_window then the hint is not recorded, no 
> hint file is created, and the mutation never arrives at the node after it 
> comes up again.
> I have debugged this and it appears to due to the way hint window is 
> persisted after https://issues.apache.org/jira/browse/CASSANDRA-14309
> The code here: 
> [https://github.com/apache/cassandra/blame/cassandra-4.1/src/java/org/apache/cassandra/service/StorageProxy.java#L2402]
>  uses the time stored in the HintsBuffer.earliestHintByHost map.  This map is 
> based on the UUID of the host, but this does not seem to be cleared when the 
> node is back up, and I think this is what is causing the problem.
>  
> This is in cassandra 4.1.5



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org

[jira] [Comment Edited] (CASSANDRA-19495) Hints not stored after node goes down for the second time

Reply via email to