[
https://issues.apache.org/jira/browse/CASSANDRA-19452?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Yifan Cai updated CASSANDRA-19452:
----------------------------------
Fix Version/s: NA
Since Version: NA
Source Control Link:
https://github.com/apache/cassandra-analytics/commit/a13532272051d4e4608f92d53bdd997103e8ea19
Resolution: Fixed
Status: Resolved (was: Ready to Commit)
> [Analytics] Use constant reference time during bulk read process
> ----------------------------------------------------------------
>
> Key: CASSANDRA-19452
> URL: https://issues.apache.org/jira/browse/CASSANDRA-19452
> Project: Cassandra
> Issue Type: Bug
> Components: Analytics Library
> Reporter: Yifan Cai
> Assignee: Yifan Cai
> Priority: Normal
> Fix For: NA
>
> Time Spent: 1.5h
> Remaining Estimate: 0h
>
> Bulk reader leverages a time provider that returns the current time during
> read to guide compaction and validation.
> As the current time value varies in spark executors, there is a chance that
> rows/cells get expired inconsistently. Another issue is the validation on
> no-expired rows/cells after compaction might fail, since they could expire
> during read. The read can take minutes or even hours.
> It could lead to false data omission and job failure.
> The fix is to use constant reference time that is decided by Spark driver and
> distribute to all executors. The reference time is used for compaction and
> validation later.
--
This message was sent by Atlassian Jira
(v8.20.10#820010)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]