Re: Looking for suggestions to deal with large backups not completing in 24-hours

Robert Talda Sun, 15 Jul 2018 16:30:57 -0700

Zoltan:
 Finally get a chance to answer you.  I :think: I understand what you are 
getting at…

 First, some numbers - recalling that each of these nodes is one storage device:
Node1: 358,000,000+ files totalling 430 TB of primary occupied space
Node2: 302,000,000+ files totaling 82 TB of primary occupied space
Node3: 79,000,000+ files totaling 75 TB of primary occupied space
Node4: 1,000,000+ files totalling 75 TB of primary occupied space
Node5: 17,000,000+ files totalling 42 TB of  primary occupied space
  There are more, but I think this answers your initial question.

 Restore requests are handled by the local system admin or, for lack of a 
better description, data admin.  (Basically, the research area has a person 
dedicated to all the various data issues related to research grants, from 
including proper verbiage in grant requests to making sure the necessary 
protections are in place). 

  We try to make it as simple as we can, because we do concentrate all the data 
in one node per storage device (usually a NAS).  So restores are usually done 
directly from the node - while all backups are done through proxies.  
Generally, the restores are done without permissions so that the appropriate 
permissions can be applied to the restored data.  (Oft times, the data is 
restored so a different user or set of users can work with it, so the original 
permissions aren’t useful)

  There are some exceptions - of course, as we work at universities, there are 
always exceptions - and these we handle as best we can by providing proxy nodes 
with restricted priviledges.

  Let me know if I can provide more,
Bob

Robert Talda
EZ-Backup Systems Engineer
Cornell University
+1 607-255-8280
r...@cornell.edu

> On Jul 11, 2018, at 3:59 PM, Zoltan Forray <zfor...@vcu.edu> wrote:
> 
> Robert,
> 
> Thanks for the insight/suggestions.  Your scenario is similar to ours but
> on a larger scale when it comes to the amount of data/files to process,
> thus the issue (assuming such since you didn't list numbers).  Currently we
> have 91 ISILON nodes totaling 140M objects and 230TB of data. The largest
> (our troublemaker) has over 21M objects and 26TB of data (this is the one
> that takes 4-5 days).  dsminstr.log from a recently finished run shows it
> only backed up 15K objects.
> 
> We agree that this and other similarly larger nodes need to be broken up
> into smaller/less objects to backup per node.  But the owner of this large
> one is balking since previously this was backed up via a solitary Windows
> server using Journaling so everything finished in a day.
> 
> We have never dealt with proxy nodes but might need to head in that
> direction since our current method of allowing users to perform their own
> restores relies on the now deprecated Web Client.  Our current method is
> numerous Windows VM servers with 20-30 nodes defined to each.
> 
> How do you handle restore requests?
> 
> On Wed, Jul 11, 2018 at 2:56 PM Robert Talda <r...@cornell.edu> wrote:
>

Re: Looking for suggestions to deal with large backups not completing in 24-hours

Reply via email to