[ 
https://issues.apache.org/jira/browse/CASSANDRA-19893?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hao-Nan Zhu updated CASSANDRA-19893:
------------------------------------
    Description: 
Hi, I’ve encountered a potential performance bottleneck in _ColumnFamilyStore_ 
which could cause indefinite waiting. 

 

The method 
_[snapshotWithoutMemtable|https://github.com/apache/cassandra/blob/02f38208b15b119b3038482c5e36f05c14e2a4cf/src/java/org/apache/cassandra/db/ColumnFamilyStore.java#L2007]_
 has iteration over _ColumnFamilyStore_ instances, which creates hard links for 
each of {_}SSTables{_}, and takes a snapshot. In the iteration, It makes a call 
to 
{_}[selectAndReference|https://github.com/apache/cassandra/blob/02f38208b15b119b3038482c5e36f05c14e2a4cf/src/java/org/apache/cassandra/db/ColumnFamilyStore.java#L1873]{_},
 which uses a *while (true)* loop to repeatedly attempt obtaining references to 
_SSTables_ until successful. This continuous looping behavior can lead to an 
indefinite waiting, causing _snapshotWithoutMemtable_ to either get stuck or 
take an excessively long time to process each _ColumnFamilyStore_ instance. 
This issue could become more severe when dealing with a large number of 
_ColumnFamilyStore_ instances, leading to significant delays and possible 
resource contention.

 

To mitigate this issue, one possible solution would be introducing a timeout or 
a maximum number of retries for {_}selectAndReference{_}. This would help 
prevent the indefinite spinning and ensure a more predictable execution time. 

 

I wonder if there is anything incorrect with the analysis, and if it is worth 
having an optimization on it. Thanks!

  was:
Hi, I’ve encountered a potential performance bottleneck in _ColumnFamilyStore_ 
which could cause indefinite waiting. 

 

The method 
_[snapshotWithoutMemtable|https://github.com/apache/cassandra/blob/02f38208b15b119b3038482c5e36f05c14e2a4cf/src/java/org/apache/cassandra/db/ColumnFamilyStore.java#L2007]_
 has iteration over _ColumnFamilyStore_ instances, which creates hard links for 
each of {_}SSTables{_}, and takes a snapshot. In the iteration, It makes a call 
to[ 
_selectAndReference_|https://github.com/apache/cassandra/blob/02f38208b15b119b3038482c5e36f05c14e2a4cf/src/java/org/apache/cassandra/db/ColumnFamilyStore.java#L1873],
 which uses a *while (true)* loop to repeatedly attempt obtaining references to 
_SSTables_ until successful. This continuous looping behavior can lead to an 
indefinite waiting, causing _snapshotWithoutMemtable_ to either get stuck or 
take an excessively long time to process each _ColumnFamilyStore_ instance. 
This issue could become more severe when dealing with a large number of 
_ColumnFamilyStore_ instances, leading to significant delays and possible 
resource contention.

 

To mitigate this issue, one possible solution would be introducing a timeout or 
a maximum number of retries for {_}selectAndReference{_}. This would help 
prevent the indefinite spinning and ensure a more predictable execution time. 

 

I wonder if there is anything incorrect with the analysis, and if it is worth 
having an optimization on it. Thanks!


> Add timeout or maximum retries for ColumnFamilyStore.selectAndReference
> -----------------------------------------------------------------------
>
>                 Key: CASSANDRA-19893
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-19893
>             Project: Cassandra
>          Issue Type: Improvement
>          Components: Local/Snapshots
>            Reporter: Hao-Nan Zhu
>            Priority: Normal
>
> Hi, I’ve encountered a potential performance bottleneck in 
> _ColumnFamilyStore_ which could cause indefinite waiting. 
>  
> The method 
> _[snapshotWithoutMemtable|https://github.com/apache/cassandra/blob/02f38208b15b119b3038482c5e36f05c14e2a4cf/src/java/org/apache/cassandra/db/ColumnFamilyStore.java#L2007]_
>  has iteration over _ColumnFamilyStore_ instances, which creates hard links 
> for each of {_}SSTables{_}, and takes a snapshot. In the iteration, It makes 
> a call to 
> {_}[selectAndReference|https://github.com/apache/cassandra/blob/02f38208b15b119b3038482c5e36f05c14e2a4cf/src/java/org/apache/cassandra/db/ColumnFamilyStore.java#L1873]{_},
>  which uses a *while (true)* loop to repeatedly attempt obtaining references 
> to _SSTables_ until successful. This continuous looping behavior can lead to 
> an indefinite waiting, causing _snapshotWithoutMemtable_ to either get stuck 
> or take an excessively long time to process each _ColumnFamilyStore_ 
> instance. This issue could become more severe when dealing with a large 
> number of _ColumnFamilyStore_ instances, leading to significant delays and 
> possible resource contention.
>  
> To mitigate this issue, one possible solution would be introducing a timeout 
> or a maximum number of retries for {_}selectAndReference{_}. This would help 
> prevent the indefinite spinning and ensure a more predictable execution time. 
>  
> I wonder if there is anything incorrect with the analysis, and if it is worth 
> having an optimization on it. Thanks!



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to