Speaking of needles in haystacks, a one time colleague of mine was working for a company that analyzed oil seismic survey data on a large array of IBM clustered machines. He was a very smart cookie and understood the geophysics of it all (Hi Stephen if you are listening)
The data came in on reels of tape and represented survey data from a surveyed line. A full survey consisted of a series of evenly spaced lines that mapped an area. He took this data and using the TSM API somehow stored it on 3590s (this was back in the old ADSM 3 days). The smart part was that oil companies could ask for an analysis of an area and give the coordinates that they wanted. His software would figure out which bits of data he needed from TSM mount the appropriate tapes and gather the data then feed it into the machines for analysis. This enabled his company to effectively leverage their investment in surveys and also provide the data faster to customers than anyone else could. At least that's what he told me :) The TSM API could be used to do a whole stack of this sort of storage work, but it is hampered by the lack of an API in something we can use, eg perl, python, or my current favourite ruby. I've taken a brief look at writing a library interface to ruby, but it is somewhat difficult - especially for an old COBOL programmer like me - why does everyone have to write in C anyway! Has anyone on the list done any work along these lines? Regards Steve Steven Harris AIX and TSM Admin Brisbane Australia Robin Sharpe wrote:
Well, what I meant was "move data" or "move nodedata"... but now that I think about it, those commands will have no effect on retention. They will only move the data to different volumes, maybe in different storage pools. A "generate backupset" would make a copy that has it's own retention criteria, but IMO backupsets are too hard to manage effectively... but then I haven't really used them that much. Also, backupsets will only contain active data, and so may be incomplete in a litigation context. I think the bottom line here, unfortunately, is that we're trying to make TSM fulfill a need it was not designed for. TSM is great for backing up a system and getting it back to a known operational state. It's also great for restoring a single file, set of files, directories, filesystems, etc. It's not too useful for finding a "needle in a haystack", like "we need all emails from John Doe to XYZ Corp regarding product X"... there are archiving systems emerging that can do that kind of function. It would be nice if TSM could serve as the back end for such a system so you can minimize the back-end data store. I believe there are a couple that can do that. Of course, there's no free lunch... implementing an archive solution like that will cost significant bucks. -Robin
