[
https://issues.apache.org/jira/browse/IGNITE-28074?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Roman Puchkovskiy updated IGNITE-28074:
---------------------------------------
Description:
There is a piece of log:
{noformat}
[2026-03-03T11:59:00,187][INFO ][Thread-37][LoggingRocksDbFlushListener]
Starting rocksdb flush process [name='table data log', nodeName='ihpt_t_3344',
reason=WRITE_BUFFER_FULL]
{noformat}
...
{noformat}
[2026-03-03T11:59:18,069][INFO ][Thread-40][LoggingRocksDbFlushListener]
Finishing rocksdb flush process [name='table data log', nodeName='ihpt_t_3344',
duration=17881ms]
[2026-03-03T11:59:18,084][INFO ][Thread-41][LoggingRocksDbFlushListener]
Starting rocksdb flush process [name='table data log', nodeName='ihpt_t_3344',
reason=WRITE_BUFFER_FULL]
[2026-03-03T11:59:18,084][INFO ][Thread-42][LoggingRocksDbFlushListener]
Finishing rocksdb flush process [name='table data log', nodeName='ihpt_t_3344',
duration=17896ms]
{noformat}
We see 2 flushes, first of them takes almost 18 seconds, second one is almost
instantaneous, but it also reports to have almost the same duration.
# It seems that onFlushBegin() of second flush ran concurrently with
onFlushCompleted() of the same flush, so the latter took start moment from
first flush (before second onFlushBegin() had a chance to update it)
# We should probably look at jobId (and also log it) to avoid the race
# We deliberately block onFlushBegin() to hold the flush from being started
until our syncer does its job. We need to make sure that in current RocksDB
this still works as intended
# We should also log file name of the SST, this will help to figure out why
sometimes there are more SST files than the number of times our listener was
notified about flush start
> Possible race in LoggingRocksDbFlushListener
> --------------------------------------------
>
> Key: IGNITE-28074
> URL: https://issues.apache.org/jira/browse/IGNITE-28074
> Project: Ignite
> Issue Type: Bug
> Reporter: Roman Puchkovskiy
> Priority: Major
> Labels: ignite-3
>
> There is a piece of log:
> {noformat}
> [2026-03-03T11:59:00,187][INFO ][Thread-37][LoggingRocksDbFlushListener]
> Starting rocksdb flush process [name='table data log',
> nodeName='ihpt_t_3344', reason=WRITE_BUFFER_FULL]
> {noformat}
> ...
> {noformat}
> [2026-03-03T11:59:18,069][INFO ][Thread-40][LoggingRocksDbFlushListener]
> Finishing rocksdb flush process [name='table data log',
> nodeName='ihpt_t_3344', duration=17881ms]
> [2026-03-03T11:59:18,084][INFO ][Thread-41][LoggingRocksDbFlushListener]
> Starting rocksdb flush process [name='table data log',
> nodeName='ihpt_t_3344', reason=WRITE_BUFFER_FULL]
> [2026-03-03T11:59:18,084][INFO ][Thread-42][LoggingRocksDbFlushListener]
> Finishing rocksdb flush process [name='table data log',
> nodeName='ihpt_t_3344', duration=17896ms]
> {noformat}
> We see 2 flushes, first of them takes almost 18 seconds, second one is almost
> instantaneous, but it also reports to have almost the same duration.
> # It seems that onFlushBegin() of second flush ran concurrently with
> onFlushCompleted() of the same flush, so the latter took start moment from
> first flush (before second onFlushBegin() had a chance to update it)
> # We should probably look at jobId (and also log it) to avoid the race
> # We deliberately block onFlushBegin() to hold the flush from being started
> until our syncer does its job. We need to make sure that in current RocksDB
> this still works as intended
> # We should also log file name of the SST, this will help to figure out why
> sometimes there are more SST files than the number of times our listener was
> notified about flush start
--
This message was sent by Atlassian Jira
(v8.20.10#820010)