Tim Owen commented on SOLR-13029:

Not sure - I can see someone might want parallelised file copies as well, so 
that ticket is still valid I think. It probably depends on how many collections 
you have to restore, if (like us) you have many collections to do, we just kick 
them off in parallel and let each one work through its files in series. But if 
you had 1 or 2 large collections it might be better done with the proposed 
change there.

> Allow HDFS backup/restore buffer size to be configured
> ------------------------------------------------------
>                 Key: SOLR-13029
>                 URL: https://issues.apache.org/jira/browse/SOLR-13029
>             Project: Solr
>          Issue Type: Improvement
>      Security Level: Public(Default Security Level. Issues are Public) 
>          Components: Backup/Restore, hdfs
>    Affects Versions: 7.5, 8.0
>            Reporter: Tim Owen
>            Assignee: Mikhail Khludnev
>            Priority: Major
>             Fix For: 8.0, 7.7, master (9.0)
>         Attachments: SOLR-13029.patch, SOLR-13029.patch, SOLR-13029.patch
> There's a default hardcoded buffer size setting of 4096 in the HDFS code 
> which means in particular that restoring a backup from HDFS takes a long 
> time. Copying multi-GB files from HDFS using a buffer as small as 4096 bytes 
> is very inefficient. We changed this in our local build used in production to 
> 256kB and saw a 10x speed improvement when restoring a backup. Attached patch 
> simply makes this size configurable using a command line setting, much like 
> several other buffer size values.

This message was sent by Atlassian JIRA

To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

Reply via email to