Hello, you might think of renaming the node every day, then doing an export followed by a delete of the filespace (this will free the DB). In case of restore, import the needed node René Lambelet Nestec S.A. / Informatique du Centre 55, av. Nestlé CH-1800 Vevey (Switzerland) *+41'21'924'35'43 7+41'21'924'28'88 * K4-117 email [EMAIL PROTECTED] Visit our site: http://www.nestle.com This message is intended only for the use of the addressee and may contain information that is privileged and confidential. > -----Original Message----- > From: bbullock [SMTP:[EMAIL PROTECTED]] > Sent: Tuesday, February 20, 2001 11:22 PM > To: [EMAIL PROTECTED] > Subject: Re: Performance Large Files vs. Small Files > > Jeff, > You hit the nail on the head of what is the biggest problem I face > with TSM today. Excuse me for being long winded, but let me explain the > boat > I'm in, and how it relates to many small files. > > We have been using TSM for about 5 years at our company and have > finally got everyone on our band wagon and away from the variety of > backup > solutions and media we had in the past. We now have 8 TSM servers running > on > AIX hosts (S80s) attached to 4 libraries with a total of 44 3590E tape > drives. A nice beefy environment. > > The problem that keeps me awake at night now is that we now have > manufacturing machines wanting to use TSM for their backups. In the past > they have used small DLT libraries locally attached to the host, but > that's > labor intensive and they want to take advantage of our "enterprise backup > solution". A great coup for my job security and TSM, as they now see the > benefit of TSM. > > The problem with these hosts is that they generate many, many > small > files every day. Without going into any detail, each file is a test on a > part that they may need to look at if the part ever fails. Each part gets > many tests done to it through the manufacturing process, so many files are > generated for each part. > > How many files? Well, I have one Solaris-based host that generates > 500,000 new files a day in a deeply nested directory structure (about 10 > levels deep with only about 5 files per directory). Before I am asked, > "no, > they are not able to change the directory of file structure on the host. > It > runs proprietary applications that can't be altered". They are currently > keeping these files on the host for about 30 days and then deleting them. > > I have no problem moving the files to TSM on a nightly basis, we > have a nice big network pipe and the files are small. The problem is with > the TSM database growth, and the number of files per filesystem (stored in > TSM). Unfortunately, the directories are not shown when you do a 'q occ' > on > a node, so there is actually a "hidden" number of database entries that > are > taking up space in my TSM database that are not readily apparent when > looking at the output of "q node". > > One of my TSM databases is growing by about 1.5 GB a week, with no > end in sight. We currently are keeping those files for 180 days, but they > are now requesting that them be kept for 5 years (in case a part gets > returned by a customer). > > This one nightmare host now has over 20 million files (and an > unknown number of directories) across 10 filesystems. We have found from > experience, that any more than about 500,000 files in any filesystem means > a > full filesystem restore would take many hours. Just to restore the > directory > structure seems to take a few hours at least. I have told the admins of > this > host that it is very much unrecoverable in it's current state, and would > take on the order of days to restore the whole box. > > They are disappointed that an "enterprise backup solution" can't > handle this number of files any better. They are willing to work with us > to > get a solution that will both cover the daily "disaster recovery" backup > need for the host and the long term retentions they desire. > > I am pushing back and telling them that their desire to keep it > all > for 5 years is unreasonable, but thought I'd bounce it off you folks to > see > if there was some TSM solution that I was overlooking. > > There are 2 ways to control database growth: reduce the number of > database entries, or reduce the retention time. > > Here is what I've looked into so far. > > 1. Cut the incremental backup retention down to 30 days and then generate > a > backup set every 30 days for long term retention. > On paper it looks good. you don't have to move the data over the > net > again and there is only 1 database entry. Well, I'm not sure how many of > you > have tried this on a filesystem with many files, but I tried it twice on a > filesystem with only 20,000 files and it took over 1 hour to complete. > Doing > the math it would take over 100 hours to do each of these 2 million-file > filesystems. Doesn't seem really feasible. > > 2. Cut the incremental backup retention down to 30 days and run and > archive > every 30 days to the 5 year management class. > This would cut down the number of files we are tracking with the > incrementals, so a full filesystem restore from the latest backup would > have > less garbage to sort through and hopefully run quicker. Yet with the > archives, we would have to move the 600 GB over the net every 30 days and > would still end up tracking the millions of individual files for the next > 5 > years. > > 3. Use TSM as a disaster recovery solution with a short 30 day retention, > and use some other solution (like a local CD/DVD burner) to get the 5 year > retention they desire. Still looking into this one, but they don't like it > because it once again becomes a manual process to swap out CDs. > > 4. Use TSM as a disaster recovery solution (with a short 30 day retention) > and have a process tar up all the 30-day old files into one large file, > then > have TSM do an archive and delete .tar file. This would mean we only track > 1 > large tar file for every day for the 5 year time (about 1800 files). This > is > the option we are currently pursuing. > > Any other options or suggestions from the group? Any other backup > solutions you have in place for tracking many files over longer periods of > time? > > If you made it this far through this long e-mail, thanks for > letting > me drone on. > > Thanks, > Ben Bullock > UNIX Systems Manager > Micron Technology > > > > -----Original Message----- > > From: Jeff Connor [mailto:[EMAIL PROTECTED]] > > Sent: Thursday, February 15, 2001 12:01 PM > > To: [EMAIL PROTECTED] > > Subject: Re: Performance Large Files vs. Small Files > > > > > > Diana, > > > > Sorry to chime in late on this but you've hit a subject I've been > > struggling with for quite some time. > > > > We have some pretty large Windows NT file and print servers > > using MSCS. > > Each server has lots of small files(1.5 to 2.5 million) and total disk > > space(the D: drive) between 150GB and 200GB, Compaq server, > > two 400mhz xeon > > with 400MB ram. We have been running TSM on the mainframe since ADSM > > version 1 and are currently at 3.7 of the TSM server with 3.7.2.01 and > > 4.1.2 on the NT clients. > > > > Our Windows NT admins have had a concern for quite some time > > regarding TSM > > restore performance and how long it would take to restore > > that big old D: > > drive. They don't see the value in TSM as a whole as compared to the > > competition they just want to know how fast can you recover > > my entire D: > > drive. They decided they wanted to perform weekly full > > backups to direct > > attached DLT drives using Arcserve and would use the TSM > > incrementals to > > forward recover during full volume restore. We had to > > finally recover one > > of those big D: drives this past September. The Arcserve > > portion of the > > recovery took about 10 hours if I recall correctly. The TSM forward > > recovery ran for 36 hours and only restored about 8.5GB. > > They were not > > pleased. It seems all that comparing took quite some time. > > I've been > > trying to get to the root of the bottleneck since then. I've > > worked with > > support on and off over the last few months performing > > various traces and > > the like. At this point we are looking in the area of > > mainframe TCPIP and > > delay's in acknowledgments coming out of the mainframe during test > > restores. > > > > If you've worked with TSM for a number of years and through sources in > > IBM/Tivoli and the valuable information from this listserv, > > over time you > > learn about all the TSM client and server "knobs" to turn to > > try and get > > maximum performance. Things like Bufpoolsize, database cache hits, > > housekeeping processes running at the same time as > > backups/restores slowing > > things down, network issues like auto-negotiate on NIC's, MTU > > sizes, TSM > > server database and log disk placement, tape drive load/seek times and > > speeds and feeds. Basically, I think we are pretty well set > > with all those > > important things to consider. This problem we are having may be a > > mainframe TCPIP issue in the end, but I am not sure that will be the > > complete picture. > > > > We have recently installed an AIX TSM server, H80 two-way, > > 2GB memory, > > 380GB EMC 3430 disk, 6 Fibre Channel 3590-E1A drives in a > > 3494, TSM server > > at 4.1.2. We plan to move most of the larger clients from > > the TSM OS/390 > > server to the AIX TSM server. A good move to realize a performance > > improvement according to many posts on this Listserv over the > > years. I am > > in the process of testing my NT "problem children" as quickly > > as I can to > > prove this configuration will address the concerns our NT > > Admins have about > > restores of large NT servers. I'm trying to prevent them > > from installing a > > Veritas SAN solution and asking them to stick with our > > Enterprise Backup > > Strategic direction which is to utilize TSM. As you probably > > know, the SAN > > enabled TSM backup/archive client for NT is not here and may > > never be from > > what I've heard. My only option at this point is SAN tape > > library sharing > > with the TSM client and server on the same machine for each > > of our MSCS > > servers. > > > > Now I'm sure many of you reading this may be thinking of > > things like, "why > > not break the D: drive into smaller partitions so you can collocate by > > filespace and restore all the data concurrently". No go > > guys, they don't > > want to change the way they configure their servers just to > > accommodate TSM > > when the feel they would not have to with other products. > > They feel that > > with 144GB single drives around the corner who is to say what > > a "big" NT > > partition is? NT seems to support these large drives without issues. > > (Their words not mine). > > > > Back to the issue. Our initial backup tests using our new > > AIX TSM server > > have produced significant improvements in performance. I am > > just getting > > the pieces in place to perform restore tests. My first test > > a couple days > > ago was to restore part of the data from that server we had > > the issue with > > in September. It took about one hour to lay down just the directories > > before restoring any files. Probably still better than the > > mainframe but > > not great. My plan for future tests is to perform backups > > and restores of > > the same data to and from both of my TSM servers to compare > > performance. I > > will share the results with you and the rest of the listserv > > as I progress. > > > > In general I have always, like many other TSM users, achieved > > much better > > restore/backup rates with larger files versus lots of smaller files. > > Assuming you've done all the right tuning, the question that > > comes to my > > mind is, does it really come down to the architecture? The > > TSM database > > makes things very easy for day to day smaller recoveries > > which is the type > > we perform most. But does the architecture that makes day to day > > operations easier not lend itself well to backup/recovery of > > large amounts > > of data made up of small files? I have very little experience with > > competing products. Do they struggle with lots of small files as well? > > Veritas, Arserve anyone? If the issue is, as some on the > > Listserv have > > suggested, frequent interaction with the client file system > > the bottleneck, > > then I suppose the answer would be yes the other products > > have the same > > problem. Or is the issue more on the TSM database side due > > to it's design, > > and other products using different architectures may not have > > this problem? > > Maybe the competitions architecture is less bulletproof but > > if you're one > > of our NT Admins you don't seem to care when the client keeps calling > > asking how much longer the restore will be running. I know TSM > > development is aware of the issues with lots of small files > > and I would be > > curious what they plan to do about the problems Diana and I have > > experienced. > > > > The newer client option, Resourceutilization, has helped with > > backing up > > clients with lots of small files more quickly. I would love > > to see the > > same type of automated multi-tasking on restores. I don't know the > > specifics of how this actually works but it seems to me that > > when I ask to > > restore an entire NT drive, for example, the TSM > > client/server must sort > > the file list in some fashion to intelligently request tape volumes to > > minimize the mounts required. If that's the case could they > > take things > > one step further and add an option to the restore specifying > > the number of > > concurrent sessions/mountpoints to be used to perform the > > restore? For > > example, if I have a node who's collocated data is spread > > across twenty > > tapes and I have 6 tape drives available for the recovery, > > how about an > > option for the restore command like: > > > > RES -subd=y -nummp=6 d:\* > > > > where the -nummp option would be the number of mount > > points/tape drives to > > be used for the restore. TSM could sort the file list coming > > up with the > > list of tapes to be used for the restore and perhaps spread the mounts > > across 6 sessions/mount points. I'm sure I've probably made > > a complex task > > sound simple but this type of option would be very useful. I > > think many of > > us have seen the benefits of running multiple sessions to > > reduce recovery > > elapsed time. I find my current choices for doing so difficult to > > implement or politically undesirable. > > > > If others have the same issues with lots of small files in > > particular with > > Windows NT clients lets hear from you. Maybe we can come up with some > > enhancement requests. I'll pass on the results of my tests as stated > > above. I'd be interested in hearing from those of you that > > have worked > > with other products and can tell me if they have the same performance > > problems with lots of small files. If the performance of > > other products is > > impacted in the same was as TSM performance then that would be good to > > know. If it's more about the Windows NT NTFS file system then I'd be > > satisfied with that explanation as well. If it's about lots > > of interaction > > with the TSM database leads to slower performance, even when optimally > > configured, then I'd like to know what Tivoli has in the > > works to address > > the issue. Because if it's the TSM database, I could > > probably install the > > fattest Fibre Channel/network pipe with the fastest > > peripherals and server > > hardware around and it might not change a thing. > > > > Thanks > > Jeff Connor > > Niagara Mohawk Power Corp. > > > > > > > > > > > > > > "Diana J.Cline" <[EMAIL PROTECTED]>@VM.MARIST.EDU> on > > 02/14/2001 10:04:52 AM > > > > Please respond to "ADSM: Dist Stor Manager" <[EMAIL PROTECTED]> > > > > Sent by: "ADSM: Dist Stor Manager" <[EMAIL PROTECTED]> > > > > > > To: [EMAIL PROTECTED] > > cc: > > > > Subject: Performance Large Files vs. Small Files > > > > > > Using an NT Client and an AIX Server > > > > Does anyone have a TECHNICAL reason why I can backup 30GB of > > 2GB files that > > are > > stored in one directory so much faster than 30GB of 2kb files that are > > stored > > in a bunch of directories? > > > > I know that this is the case, I just would like to find out > > why. If the > > amount > > of data is the same and the Network Data Transfer Rate is the > > same between > > the > > two backups, why does it take the TSM server so much longer > > to process the > > files being sent by the larger amount of files in multiple > > directories? > > > > I sure would like to have the answer to this. We are trying > > to complete an > > incremental backup an NT Server with about 3 million small objects > > (according > > to TSM) in many, many folders and it can't even get done in > > 12 hours. The > > actual amount of data transferred is only about 7GB per > > night. We have > > other > > backups that can complete 50GB in 5 hours but they are in one > > directory and > > the > > # of files is smaller. > > > > Thanks > > > > > > > > > > > > Network data transfer rate > > -------------------------- > > The average rate at which the network transfers data between > > the TSM client and the TSM server, calculated by dividing the > > total number of bytes transferred by the time to transfer the > > data over the network. The time it takes for TSM to process > > objects is not included in the network transfer rate. Therefore, > > the network transfer rate is higher than the aggregate transfer > > rate. > > . > > Aggregate data transfer rate > > ---------------------------- > > The average rate at which TSM and the network transfer data > > between the TSM client and the TSM server, calculated by > > dividing the total number of bytes transferred by the time > > that elapses from the beginning to the end of the process. > > Both TSM processing and network time are included in the > > aggregate transfer rate. Therefore, the aggregate transfer > > rate is lower than the network transfer rate. > >
Re: Performance Large Files vs. Small Files
Lambelet,Rene,VEVEY,FC-SIL/INF. Sun, 25 Feb 2001 23:58:42 -0800
- Re: Performance Large Files vs. Small File... Stephen Mackereth
- Re: Performance Large Files vs. Small File... Steve Harris
- Re: Performance Large Files vs. Small File... bbullock
- Re: Performance Large Files vs. Small... Petr Prerost
- Re: Performance Large Files vs. Small File... bbullock
- TSM Pricing [was Re: Performance Larg... Thomas A. La Porte
- Re: TSM Pricing [was Re: Performa... Kelly J. Lipp
- Re: Performance Large Files vs. Small File... bbullock
- Re: Performance Large Files vs. Small File... Lambelet,Rene,VEVEY,FC-SIL/INF.
- Re: Performance Large Files vs. Small File... bbullock
- Re: Performance Large Files vs. Small File... Lambelet,Rene,VEVEY,FC-SIL/INF.
