Timothy Potter updated SOLR-9961:
    Attachment: SOLR-9961.patch

> RestoreCore needs the option to download files in parallel.
> -----------------------------------------------------------
>                 Key: SOLR-9961
>                 URL: https://issues.apache.org/jira/browse/SOLR-9961
>             Project: Solr
>          Issue Type: Improvement
>      Security Level: Public(Default Security Level. Issues are Public) 
>          Components: Backup/Restore
>    Affects Versions: 6.2.1
>            Reporter: Timothy Potter
>         Attachments: SOLR-9961.patch
> My backup to cloud storage (Google cloud storage in this case, but I think 
> this is a general problem) takes 8 minutes ... the restore of the same core 
> takes hours. The restore loop in RestoreCore is serial and doesn't allow me 
> to parallelize the expensive part of this operation (the IO from the remote 
> cloud storage service). We need the option to parallelize the download (like 
> distcp). 
> Also, I tried downloading the same directory using gsutil and it was very 
> fast, like 2 minutes. So I know it's not the pipe that's limiting perf here.
> Here's a very rough patch that does the parallelization. We may also want to 
> consider a two-step approach: 1) download in parallel to a temp dir, 2) 
> perform all the of the checksum validation against the local temp dir. That 
> will save round trips to the remote cloud storage.

This message was sent by Atlassian JIRA

To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

Reply via email to