[ https://issues.apache.org/jira/browse/HDDS-12064?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Attila Doroszlai updated HDDS-12064: ------------------------------------ Description: Currently while checking file links, the exclude sst file list & files already present in the current tarball is checked in the entries by sequentially iterating through the entries, for each and every file in the om data directory (snapshot directory, active om.db, compaction backup sst file). Now if the exclude list or files present in the tarball is really long order of 1000s and the total number of sst files are in the order of millions, the bootstrap is going to read timeout and might take hours. We need to optimize and not perform this unnecessary iteration to avoid this {{O(n^2)}} operation and do it in {{O(n)}} (was: Currently while checking file links, the exclude sst file list & files already present in the current tarball is checked in the entries by sequentially iterating through the entries, for each and every file in the om data directory (snapshot directory, active om.db, compaction backup sst file). Now if the exclude list or files present in the tarball is really long order of 1000s and the total number of sst files are in the order of millions, the bootstrap is going to read timeout and might take hours. We need to optimize and not perform this unnecessary iteration to avoid this n^2 operation and do it in O(n)) > Optimize bootstrap logic to reduce loop while checking file links > ----------------------------------------------------------------- > > Key: HDDS-12064 > URL: https://issues.apache.org/jira/browse/HDDS-12064 > Project: Apache Ozone > Issue Type: Sub-task > Reporter: Swaminathan Balachandran > Assignee: Swaminathan Balachandran > Priority: Major > Labels: pull-request-available > Fix For: 2.0.0 > > > Currently while checking file links, the exclude sst file list & files > already present in the current tarball is checked in the entries by > sequentially iterating through the entries, for each and every file in the om > data directory (snapshot directory, active om.db, compaction backup sst > file). Now if the exclude list or files present in the tarball is really long > order of 1000s and the total number of sst files are in the order of > millions, the bootstrap is going to read timeout and might take hours. We > need to optimize and not perform this unnecessary iteration to avoid this > {{O(n^2)}} operation and do it in {{O(n)}} -- This message was sent by Atlassian Jira (v8.20.10#820010) --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@ozone.apache.org For additional commands, e-mail: issues-h...@ozone.apache.org