[
https://issues.apache.org/jira/browse/CASSANDRA-10273?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15329194#comment-15329194
]
Giampaolo commented on CASSANDRA-10273:
---------------------------------------
Sorry for the delay with this bug. I've tried to understand better the problem
but I have some observations to discuss. I studied a bit point 3 and 4 to
understand how to integrate them. For these points, "scan" means
{{SSTableLister.list}}
[https://github.com/apache/cassandra/blob/fed476f9c049128674841d1c46b868979352b1a5/src/java/org/apache/cassandra/db/ColumnFamilyStore.java#L412-L414]
{code:title=src/java/org/apache/cassandra/db/ColumnFamilyStore.java|borderStyle=solid}
Directories.SSTableLister sstableFiles =
directories.sstableLister(Directories.OnTxnErr.IGNORE).skipTemporary(true);
Collection<SSTableReader> sstables =
SSTableReader.openAll(sstableFiles.list().entrySet(), metadata);
{code}
and
[https://github.com/apache/cassandra/blob/fed476f9c049128674841d1c46b868979352b1a5/src/java/org/apache/cassandra/db/ColumnFamilyStore.java#L579-L581]
{code:title=src/java/org/apache/cassandra/db/ColumnFamilyStore.java|borderStyle=solid}
Directories directories = new Directories(metadata, initialDirectories);
Directories.SSTableLister lister =
directories.sstableLister(Directories.OnTxnErr.IGNORE).includeBackups(true);
List<Integer> generations = new ArrayList<Integer>();
for (Map.Entry<Descriptor, Set<Component>> entry : lister.list().entrySet())
{code}
The scans are done with different parameters and, due to internals of the
{{SSTableLister}}, after a scan an additional filter cannot be applied. In
facts, the {{SSTableLister}} is something like a one shot builder for
{{Map<Descriptor, Set<Component>>}} (please correct me if I'm doing a "flight
of fancy").
My poor conclusion is that the scans are a bit different and I did not see a
simple way to reuse existing classes to avoid one. Changes in the
{{SSTableLister}} are out of discussion.
But I'm here to ask to more experiences developers if I'm missing some points
and there's a smart way to accomplish this task I did not see.
As sub-product of my investigation, I saw that {{Directories}} class is
instantiated may times during startup with the same parameters, but this class
is like an immutable one. All fields are final and are passed via constructor.
Another little optimization could be to reduce the number of these instances
using a cache.
> Reduce number of data directory scans during startup
> ----------------------------------------------------
>
> Key: CASSANDRA-10273
> URL: https://issues.apache.org/jira/browse/CASSANDRA-10273
> Project: Cassandra
> Issue Type: Improvement
> Reporter: Robert Stupp
> Assignee: Giampaolo
> Priority: Minor
> Labels: lhf
>
> ATM we scan each data directory four times. We could easily reduce that to at
> least two, maybe to one.
> 1. pre-flight (startup tests) scrub
> 1. pre-flight (startup tests) sstable min version
> 1. {{ColumnFamilyStore.createColumnFamilyStore}}
> 1. {{ColumnFamilyStore.<init>}} (if {{loadSSTables==true}})
> First two pre-flight tests could be combined to one and 3+4 could also be
> combined, as both appear at pretty related code paths.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)