[
https://issues.apache.org/jira/browse/CASSANDRA-19452?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Yifan Cai updated CASSANDRA-19452:
----------------------------------
Authors: Yifan Cai (was: Yifan Cai)
Test and Documentation Plan: ci; unit
Status: Patch Available (was: Open)
PR: https://github.com/apache/cassandra-analytics/pull/44
CI:
https://app.circleci.com/pipelines/github/yifan-c/cassandra-analytics?branch=CASSANDRA-19452%2Ftrunk
> [Analytics] Use constant reference time during bulk read process
> ----------------------------------------------------------------
>
> Key: CASSANDRA-19452
> URL: https://issues.apache.org/jira/browse/CASSANDRA-19452
> Project: Cassandra
> Issue Type: Bug
> Components: Analytics Library
> Reporter: Yifan Cai
> Assignee: Yifan Cai
> Priority: Normal
> Time Spent: 10m
> Remaining Estimate: 0h
>
> Bulk reader leverages a time provider that returns the current time during
> read to guide compaction and validation.
> As the current time value varies in spark executors, there is a chance that
> rows/cells get expired inconsistently. Another issue is the validation on
> no-expired rows/cells after compaction might fail, since they could expire
> during read. The read can take minutes or even hours.
> It could lead to false data omission and job failure.
> The fix is to use constant reference time that is decided by Spark driver and
> distribute to all executors. The reference time is used for compaction and
> validation later.
--
This message was sent by Atlassian Jira
(v8.20.10#820010)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]