[jira] [Updated] (CASSANDRA-19452) [Analytics] Use constant reference time during bulk read process

Yifan Cai (Jira) Thu, 29 Feb 2024 10:30:06 -0800


     [ 
https://issues.apache.org/jira/browse/CASSANDRA-19452?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]


Yifan Cai updated CASSANDRA-19452:
----------------------------------
                        Authors: Yifan Cai  (was: Yifan Cai)
    Test and Documentation Plan: ci; unit
                         Status: Patch Available  (was: Open)

PR: https://github.com/apache/cassandra-analytics/pull/44
CI: 
https://app.circleci.com/pipelines/github/yifan-c/cassandra-analytics?branch=CASSANDRA-19452%2Ftrunk

> [Analytics] Use constant reference time during bulk read process
> ----------------------------------------------------------------
>
>                 Key: CASSANDRA-19452
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-19452
>             Project: Cassandra
>          Issue Type: Bug
>          Components: Analytics Library
>            Reporter: Yifan Cai
>            Assignee: Yifan Cai
>            Priority: Normal
>          Time Spent: 10m
>  Remaining Estimate: 0h
>
> Bulk reader leverages a time provider that returns the current time during 
> read to guide compaction and validation.
> As the current time value varies in spark executors, there is a chance that 
> rows/cells get expired inconsistently. Another issue is the validation on 
> no-expired rows/cells after compaction might fail, since they could expire 
> during read. The read can take minutes or even hours.
> It could lead to false data omission and job failure.
> The fix is to use constant reference time that is decided by Spark driver and 
> distribute to all executors. The reference time is used for compaction and 
> validation later.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[jira] [Updated] (CASSANDRA-19452) [Analytics] Use constant reference time during bulk read process

Reply via email to