Bjørn, Thank you for the details. As the common consensus is, we need to break-up the number of directories/files each node processes/scans. Also seem to need the use of the PROXY NODE process to consolidate access into one node/client since 5+ nodes will be required to process what is now being attempted through 1-node.
On Tue, Jul 17, 2018 at 8:05 AM Nachtwey, Bjoern <bjoern.nacht...@gwdg.de> wrote: > Hi Zoltan, > > i will come back to the approach Jonas mentioned (as I'm the author of > that text: thanks to Jonas for doing this ;-) ) > > the text is in german of course, but the script has some comments in > English and will be understandable -- I hope so :-) > > the text describes first the problem everybody on this list will know: the > treewalk takes more times than we have. > TSM/ISP has some opportunities to speed up, such as "-incrbydate", but > they do not work properly. > > So for me the only solution is to parallelize the tree walk and do partial > incremental backups. > First tried to write it with BASH commands, but multithreading was not > easy to implement and second it won't run on windows -- but our largest > filers ( 500 TB - 1.2 PB) need to be accessed via CIFS to store the ACL > information. > My first steps with PowerShell for the Windows cost lots of time and were > disappointing. > Using PERL made everything really easy as it runs on windows with the > strawberry perl software and within the script there are only a few > if-conditions needed to determine between Linux and Windows. > > I did some tests according to the depth or the level of the filetree to > dive in: > As the subfolders are of unequal size, diving just below the mount point > and parallelize on the folders of this "first level" mostly does not work > well, there's (nearly) always one folder taking all the time. > On the other hand diving into all levels will take a certain amount of > additional time. > > The best performance I do see using 3 to 4 levels and 4 to 6 parallel > threads for each node. Due to separating users and for accounting I have > several nodes on such large file systems. So in total there are about 20 to > 40 streams in parallel. > > Rudi Wüst mentioned in my text figured out a p520 server running AIX6 will > support up to 2,000 parallel streams, but as mentioned by Grant using an > isilon system the filer will be the bottle neck. > > As mentioned by Del, you may also test a commercial software "MAGS" by > general storage, it can addresses multiple isilon nodes in parallel > > If there're any questions -- just ask or have a look on the script: > https://gitlab.gwdg.de/bnachtw/dsmci > > // even if the last submit is about 4 month old, the project is still in > development ;-) > > > > ==> maybe I should update and translate the text from the "GWDG news" to > English? Any interest? > > > Best > Bjørn > > > p.s. > A Result from the wild (weekly backup of a node from a 343 TB Quantum > StorNext File System) : > >> > Process ID : 12988 > Path processed : <removed> > ------------------------------------------------- > Start time : 2018-07-14 12:00 > End time : 2018-07-15 06:07 > total processing time : 3d 15h 59m 23s > total wallclock time : 18h 7m 30s > effective speedup : 4.855 using 6 parallel threads > datatransfertime ratio: 3.575 % > ------------------------------------------------- > Objects inspected : 92061596 > Objects backed up : 9774876 > Objects updated : 0 > Objects deleted : 0 > Objects expired : 7696 > Objects failed : 0 > Bytes inspected : 52818.242 (GB) > Bytes transferred : 5063.620 (GB) > ------------------------------------------------- > Number of Errors : 0 > Number of Warnings : 43 > # of severe Errors : 0 > # Out-of-Space Errors : 0 > << > > -------------------------------------------------------------------------------------------------- > > Bjørn Nachtwey > > Arbeitsgruppe "IT-Infrastruktur“ > Tel.: +49 551 201-2181, E-Mail: bjoern.nacht...@gwdg.de > -------------------------------------------------------------------------------------------------- > > Gesellschaft für wissenschaftliche Datenverarbeitung mbH Göttingen (GWDG) > Am Faßberg 11, 37077 Göttingen, URL: http://www.gwdg.de > Tel.: +49 551 201-1510, Fax: +49 551 201-2150, E-Mail: g...@gwdg.de > Service-Hotline: Tel.: +49 551 201-1523, E-Mail: supp...@gwdg.de > Geschäftsführer: Prof. Dr. Ramin Yahyapour > Aufsichtsratsvorsitzender: Prof. Dr. Norbert Lossau > Sitz der Gesellschaft: Göttingen > Registergericht: Göttingen, Handelsregister-Nr. B 598 > -------------------------------------------------------------------------------------------------- > > Zertifiziert nach ISO 9001 > > -------------------------------------------------------------------------------------------------- > > -----Ursprüngliche Nachricht----- > Von: ADSM: Dist Stor Manager <ADSM-L@VM.MARIST.EDU> Im Auftrag von Zoltan > Forray > Gesendet: Mittwoch, 11. Juli 2018 13:50 > An: ADSM-L@VM.MARIST.EDU > Betreff: Re: [ADSM-L] Looking for suggestions to deal with large backups > not completing in 24-hours > > I will need to translate to English but I gather it is talking about the > RESOURCEUTILZATION / MAXNUMMP values. While we have increased MAXNUMMP to > 5 on the server (will try going higher), not sure how much good it would > do since the backup schedule uses OBJECTS to point to a specific/single > mountpoint/filesystem (see below) but is worth trying to bump the > RESOURCEUTILIZATION value on the client even higher... > > We have checked the dsminstr.log file and it is spending 92% of the time > in PROCESS DIRS (no surprise) > > 7:46:25 AM SUN : q schedule * ISILON-SOM-SOMADFS1 f=d > Policy Domain Name: DFS > Schedule Name: ISILON-SOM-SOMADFS1 > Description: ISILON-SOM-SOMADFS1 > Action: Incremental > Subaction: > Options: -subdir=yes > Objects: \\rams.adp.vcu.edu\SOM\TSM\SOMADFS1\* > Priority: 5 > Start Date/Time: 12/05/2017 08:30:00 > Duration: 1 Hour(s) > Maximum Run Time (Minutes): 0 > Schedule Style: Enhanced > Period: > Day of Week: Any > Month: Any > Day of Month: Any > Week of Month: Any > Expiration: > Last Update by (administrator): ZFORRAY > Last Update Date/Time: 01/12/2018 10:30:48 > Managing profile: > > > On Tue, Jul 10, 2018 at 4:06 AM Jansen, Jonas <jan...@itc.rwth-aachen.de> > wrote: > > > It is possible to da a parallel backup of file system parts. > > https://www.gwdg.de/documents/20182/27257/GN_11-2016_www.pdf (german) > > have a look on page 10. > > > > --- > > Jonas Jansen > > > > IT Center > > Gruppe: Server & Storage > > Abteilung: Systeme & Betrieb > > RWTH Aachen University > > Seffenter Weg 23 > > 52074 Aachen > > Tel: +49 241 80-28784 > > Fax: +49 241 80-22134 > > jan...@itc.rwth-aachen.de > > www.itc.rwth-aachen.de > > > > -----Original Message----- > > From: ADSM: Dist Stor Manager <ADSM-L@VM.MARIST.EDU> On Behalf Of Del > > Hoobler > > Sent: Monday, July 9, 2018 3:29 PM > > To: ADSM-L@VM.MARIST.EDU > > Subject: Re: [ADSM-L] Looking for suggestions to deal with large > > backups not completing in 24-hours > > > > They are a 3rd-party partner that offers an integrated Spectrum > > Protect solution for large filer backups. > > > > > > Del > > > > ---------------------------------------------------- > > > > "ADSM: Dist Stor Manager" <ADSM-L@VM.MARIST.EDU> wrote on 07/09/2018 > > 09:17:06 AM: > > > > > From: Zoltan Forray <zfor...@vcu.edu> > > > To: ADSM-L@VM.MARIST.EDU > > > Date: 07/09/2018 09:17 AM > > > Subject: Re: Looking for suggestions to deal with large backups not > > > completing in 24-hours Sent by: "ADSM: Dist Stor Manager" > > > <ADSM-L@VM.MARIST.EDU> > > > > > > Thanks Del. Very interesting. Are they a VAR for IBM? > > > > > > Not sure if it would work in the current configuration we are using > > > to > > back > > > up ISILON. I have passed the info on. > > > > > > BTW, FWIW, when I copied/pasted the info, Chrome spell-checker > > red-flagged > > > on "The easy way to incrementally backup billons of objects" > (billions). > > > So if you know anybody at the company, please pass it on to them. > > > > > > On Mon, Jul 9, 2018 at 6:51 AM Del Hoobler <hoob...@us.ibm.com> wrote: > > > > > > > Another possible idea is to look at General Storage dsmISI MAGS: > > > > > > > > INVALID URI REMOVED > > > > > > > u=http-3A__www.general-2Dstorage.com_PRODUCTS_products.html&d=DwIBaQ&c > > =jf_ia > > SHvJObTbx- > > > > > > > siA1ZOg&r=0hq2JX5c3TEZNriHEs7Zf7HrkY2fNtONOrEOM8Txvk8&m=ofZM7gZ7p5GL1H > > FyHU75 > > lwUZLmc_kYAQxroVCZQUCSs&s=25_psxEcE0fvxruxybvMJZzSZv- > > > ach7r-VHXaLNVD_E&e= > > > > > > > > > > > > Del > > > > > > > > > > > > "ADSM: Dist Stor Manager" <ADSM-L@VM.MARIST.EDU> wrote on > > > > 07/05/2018 > > > > 02:52:27 PM: > > > > > > > > > From: Zoltan Forray <zfor...@vcu.edu> > > > > > To: ADSM-L@VM.MARIST.EDU > > > > > Date: 07/05/2018 02:53 PM > > > > > Subject: Looking for suggestions to deal with large backups not > > > > > completing in 24-hours Sent by: "ADSM: Dist Stor Manager" > > > > > <ADSM-L@VM.MARIST.EDU> > > > > > > > > > > As I have mentioned in the past, we have gone through large > > migrations > > > > to > > > > > DFS based storage on EMC ISILON hardware. As you may recall, we > > backup > > > > > these DFS mounts (about 90 at last count) using multiple Windows > > servers > > > > > that run multiple ISP nodes (about 30-each) and they access each > > > > > DFS mount/filesystem via -object=\\rams.adp.vcu.edu > \departmentname. > > > > > > > > > > This has lead to lots of performance issue with backups and some > > > > > departments are now complain that their backups are running into > > > > > multiple-days in some cases. > > > > > > > > > > One such case in a department with 2-nodes with over 30-million > > objects > > > > for > > > > > each node. In the past, their backups were able to finish > > > > > quicker > > since > > > > > they were accessed via dedicated servers and were able to use > > Journaling > > > > to > > > > > reduce the scan times. Unless things have changed, I believe > > Journling > > > > is > > > > > not an option due to how the files are accessed. > > > > > > > > > > FWIW, average backups are usually <50k files and <200GB once it > > finished > > > > > scanning..... > > > > > > > > > > Also, the idea of HSM/SPACEMANAGEMENT has reared its ugly head > > > > > since > > > > many > > > > > of these objects haven't been accessed in many years old. But as > > > > > I understand it, that won't work either given our current > > configuration. > > > > > > > > > > Given the current DFS configuration (previously CIFS), what can > > > > > we > > do to > > > > > improve backup performance? > > > > > > > > > > So, any-and-all ideas are up for discussion. There is even > > discussion > > > > on > > > > > replacing ISP/TSM due to these issues/limitations. > > > > > > > > > > -- > > > > > *Zoltan Forray* > > > > > Spectrum Protect (p.k.a. TSM) Software & Hardware Administrator > > > > > Xymon Monitor Administrator VMware Administrator Virginia > > > > > Commonwealth University UCC/Office of Technology Services > > > > > www.ucc.vcu.edu zfor...@vcu.edu - 804-828-4807 Don't be a > > > > > phishing victim - VCU and other reputable organizations > > will > > > > > never use email to request that you reply with your password, > > > > > social security number or confidential personal information. For > > > > > more > > details > > > > > visit INVALID URI REMOVED > > > > > u=http-3A__phishing.vcu.edu_&d=DwIBaQ&c=jf_iaSHvJObTbx- > > > > > siA1ZOg&r=0hq2JX5c3TEZNriHEs7Zf7HrkY2fNtONOrEOM8Txvk8&m=5bz_TktY > > > > > 3- > > > > > a432oKYronO-w1z- > > > > > ax8md3tzFqX9nGxoU&s=EudIhVvfUVx4-5UmfJHaRUzHCd7Agwk3Pog8wmEEpdA& > > > > > e= > > > > > > > > > > > > > > > > > > -- > > > *Zoltan Forray* > > > Spectrum Protect (p.k.a. TSM) Software & Hardware Administrator > > > Xymon Monitor Administrator VMware Administrator Virginia > > > Commonwealth University UCC/Office of Technology Services > > > www.ucc.vcu.edu zfor...@vcu.edu - 804-828-4807 Don't be a phishing > > > victim - VCU and other reputable organizations will never use email > > > to request that you reply with your password, social security number > > > or confidential personal information. For more details visit INVALID > > > URI REMOVED > > > u=http-3A__phishing.vcu.edu_&d=DwIBaQ&c=jf_iaSHvJObTbx- > > > > > > > siA1ZOg&r=0hq2JX5c3TEZNriHEs7Zf7HrkY2fNtONOrEOM8Txvk8&m=ofZM7gZ7p5GL1H > > FyHU75 > > lwUZLmc_kYAQxroVCZQUCSs&s=umTd28h- > > > GlxqSvNShsNIqm8D1PcanVk0HPcP5KTurKw&e= > > > > > > > > -- > *Zoltan Forray* > Spectrum Protect (p.k.a. TSM) Software & Hardware Administrator Xymon > Monitor Administrator VMware Administrator Virginia Commonwealth University > UCC/Office of Technology Services www.ucc.vcu.edu zfor...@vcu.edu - > 804-828-4807 Don't be a phishing victim - VCU and other reputable > organizations will never use email to request that you reply with your > password, social security number or confidential personal information. For > more details visit http://phishing.vcu.edu/ > -- *Zoltan Forray* Spectrum Protect (p.k.a. TSM) Software & Hardware Administrator Xymon Monitor Administrator VMware Administrator Virginia Commonwealth University UCC/Office of Technology Services www.ucc.vcu.edu zfor...@vcu.edu - 804-828-4807 Don't be a phishing victim - VCU and other reputable organizations will never use email to request that you reply with your password, social security number or confidential personal information. For more details visit http://phishing.vcu.edu/