mchennupati opened a new issue, #726:
URL: https://github.com/apache/solr-operator/issues/726

   I am restoring a large index (655G) that is currently on google cloud 
storage to a new solr cloud on kubernetes instance.  I am trying to understand 
how much space I need to allocate to each of my node pvcs. 
   
   I am currently using the collections api, with async to restore a collection 
saved in gcs.
   
   When I check my disk usage for /var/solr/data on each of the nodes, it looks 
like this. So each of them appears to be downloading the entire index. I 
initially allocated 500G to each of the pvcs but that turned out to be too 
little. I am now doing it with 700G. 
   
   Is this expected behaviour or am I doing something wrong ? One would have 
expected the metadata has enough information to download the index in parts and 
not do it 655G x 3.  It's cost me a fair bit in network costs already as I 
reiterate :)
   
   In general, how would one restore a large index, I did not find a 
solrrestore similar to solrbackups in the solr operator crds. 
   
   So I ran an async job using the solr collections api.
   
   Thanks !
   
   /var/solr/data$ du
   4       ./userfiles
   4       
./backup-restore/gcs-backups/gcscredential/..2024_10_11_06_16_24.1266852566
   4       ./backup-restore/gcs-backups/gcscredential
   8       ./backup-restore/gcs-backups
   12      ./backup-restore
   4       ./filestore
   4       ./mycoll_shard3_replica_n3/data/tlog
   4       ./mycoll_shard3_replica_n3/data/snapshot_metadata
   8       ./mycoll_shard3_replica_n3/data/index
   85744132        ./mycoll_shard3_replica_n3/data/restore.20241011062904489
   85744152        ./mycoll_shard3_replica_n3/data
   85744160        ./mycoll_shard3_replica_n3
   85744192        .
   solr@mycoll-solrcloud-0:/var/solr/data$ du -sh


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to