[
https://issues.apache.org/jira/browse/SOLR-9961?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16705257#comment-16705257
]
Tim Owen commented on SOLR-9961:
--------------------------------
We considered using this patch locally, but actually found the problem was in
slow HDFS restores because of an undersized copy buffer. See SOLR-13029 for our
change to alleviate that. Since we had lots of collections to restore, we did
those in parallel instead of making the file restore parallelised. But the
buffer patch made each file restore about 10x faster, with a 256kB buffer
instead of 4k.
> RestoreCore needs the option to download files in parallel.
> -----------------------------------------------------------
>
> Key: SOLR-9961
> URL: https://issues.apache.org/jira/browse/SOLR-9961
> Project: Solr
> Issue Type: Improvement
> Security Level: Public(Default Security Level. Issues are Public)
> Components: Backup/Restore
> Affects Versions: 6.2.1
> Reporter: Timothy Potter
> Priority: Major
> Attachments: SOLR-9961.patch, SOLR-9961.patch
>
>
> My backup to cloud storage (Google cloud storage in this case, but I think
> this is a general problem) takes 8 minutes ... the restore of the same core
> takes hours. The restore loop in RestoreCore is serial and doesn't allow me
> to parallelize the expensive part of this operation (the IO from the remote
> cloud storage service). We need the option to parallelize the download (like
> distcp).
> Also, I tried downloading the same directory using gsutil and it was very
> fast, like 2 minutes. So I know it's not the pipe that's limiting perf here.
> Here's a very rough patch that does the parallelization. We may also want to
> consider a two-step approach: 1) download in parallel to a temp dir, 2)
> perform all the of the checksum validation against the local temp dir. That
> will save round trips to the remote cloud storage.
--
This message was sent by Atlassian JIRA
(v7.6.3#76005)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]