[ 
https://issues.apache.org/jira/browse/HDDS-12064?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Attila Doroszlai updated HDDS-12064:
------------------------------------
    Description: Currently while checking file links, the exclude sst file list 
& files already present in the current tarball is checked in the entries by 
sequentially iterating through the entries, for each and every file in the om 
data directory (snapshot directory, active om.db, compaction backup sst file). 
Now if the exclude list or files present in the tarball is really long order of 
1000s and the total number of sst files are in the order of millions, the 
bootstrap is going to read timeout and might take hours. We need to optimize 
and not perform this unnecessary iteration to avoid this {{O(n^2)}} operation 
and do it in {{O(n)}}  (was: Currently while checking file links, the exclude 
sst file list & files already present in the current tarball is checked in the 
entries by sequentially iterating through the entries, for each and every file 
in the om data directory (snapshot directory, active om.db, compaction backup 
sst file). Now if the exclude list or files present in the tarball is really 
long order of 1000s and the total number of sst files are in the order of 
millions, the bootstrap is going to read timeout and might take hours. We need 
to optimize and not perform this unnecessary iteration to avoid this n^2 
operation and do it in O(n))

> Optimize bootstrap logic to reduce loop while checking file links
> -----------------------------------------------------------------
>
>                 Key: HDDS-12064
>                 URL: https://issues.apache.org/jira/browse/HDDS-12064
>             Project: Apache Ozone
>          Issue Type: Sub-task
>            Reporter: Swaminathan Balachandran
>            Assignee: Swaminathan Balachandran
>            Priority: Major
>              Labels: pull-request-available
>             Fix For: 2.0.0
>
>
> Currently while checking file links, the exclude sst file list & files 
> already present in the current tarball is checked in the entries by 
> sequentially iterating through the entries, for each and every file in the om 
> data directory (snapshot directory, active om.db, compaction backup sst 
> file). Now if the exclude list or files present in the tarball is really long 
> order of 1000s and the total number of sst files are in the order of 
> millions, the bootstrap is going to read timeout and might take hours. We 
> need to optimize and not perform this unnecessary iteration to avoid this 
> {{O(n^2)}} operation and do it in {{O(n)}}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@ozone.apache.org
For additional commands, e-mail: issues-h...@ozone.apache.org

Reply via email to