Re: File Management

Tom Allison Sat, 23 Jul 2005 05:31:09 -0700

Joel Divekar wrote:

Hi All


We have a windoz based file server with thousand of
user accounts. Each user is having thousand of files
in his home directory. Most of these files are
duplicate / modified or updated version of the
existing files. These files are either .doc or . xls
or .ppt files which are shared by groups or
departments.

Due to this my server is having terabyte of data, most
of which are redundant and our sysadmin has tough time
maintaining storage space.

For this I though of writing a small program to locate
similar or duplicate files stored on my file server
and delete them with the help of the user. The program
should work very fast and I don't know from where to
start.

Anybody out here to show me a direction to some links
on how to start and from there I shall take up. I
would also like to know long term solution for this
problem if any ? I am comfortable with linux or shell
programming.

Please advice. Thanks a lot.

Regards

Joel
Mumbai, India
9821421965


                
____________________________________________________

Start your day with Yahoo! - make it your home pagehttp://www.yahoo.com/r/hs

File::Find is one possibility except that it seems to behave badly whenfiles are being modified when the tree is being walked. My experienceof 'badly' is duplication of results. Nothing work, but something to beaware of.


So you want to build a hash structure of FullPath => md5-hash

and then build a second hash of keys=>[files] and if the key has morethan one filename associated with it.... Then you probably want morestat information (mtime) to decide which to purge.


This could probably be done in RAM if you are under 10^6 files.

Even if you can't hold the entire tree. You could at least do it inchunks, like only look at files within a size range until you parethings down a little.


--
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]
<http://learn.perl.org/> <http://learn.perl.org/first-response>

Re: File Management

Reply via email to