[
https://issues.apache.org/jira/browse/CASSANALYTICS-102?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Yifan Cai updated CASSANALYTICS-102:
------------------------------------
Description: Propose adding time range filter in Bulk Reader. This filter
can improve the performance of bulk reader especially for tables using
TimeWindowCompactionStrategy, when analytics users want to filter out SSTables
outside the required time window. Analytics users will be able to set start and
end timestamp of SSTable they are interested in with spark options. Internally
a time range filter is created from the options and passed to SSTableReader. We
filter out SSTables with min and max SSTable timestamp and avoid streaming data
files. (was: Creating this MTC request to add time range filter in Bulk
Reader. This filter can improve the performance of bulk reader especially for
tables using TimeWindowCompactionStrategy, when analytics users want to filter
out SSTables outside the required time window. Analytics users will be able to
set start and end timestamp of SSTable they are interested in with spark
options. Internally a time range filter is created from the options and passed
to SSTableReader. We filter out SSTables with min and max SSTable timestamp and
avoid streaming data files.)
> Add TimeRangeFilter to filter out SSTables outside given time window
> --------------------------------------------------------------------
>
> Key: CASSANALYTICS-102
> URL: https://issues.apache.org/jira/browse/CASSANALYTICS-102
> Project: Apache Cassandra Analytics
> Issue Type: New Feature
> Components: Reader
> Reporter: Saranya Krishnakumar
> Assignee: Saranya Krishnakumar
> Priority: Normal
> Time Spent: 1.5h
> Remaining Estimate: 0h
>
> Propose adding time range filter in Bulk Reader. This filter can improve the
> performance of bulk reader especially for tables using
> TimeWindowCompactionStrategy, when analytics users want to filter out
> SSTables outside the required time window. Analytics users will be able to
> set start and end timestamp of SSTable they are interested in with spark
> options. Internally a time range filter is created from the options and
> passed to SSTableReader. We filter out SSTables with min and max SSTable
> timestamp and avoid streaming data files.
--
This message was sent by Atlassian Jira
(v8.20.10#820010)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]