I'm putting together what should be a simple little script, and failing.
I am ultimately looking to run this against a directory, then sort the
output on the hash field and then parse for duplicates. There are two
conditions that concern me: 1) there are over 3m files in the target
directory, and 2) many of the files are quite large, over 1g.
I'm more concerned about the effects of the script on memory than on
processor - the data is fairly static, and I intend to run it once a
month or even less, but I did choose MD5 as the hash algorithm for
speed, rather than accept the default of SHA256.
This is pretty simple stuff, I'm sure, but I'm using this as a
learning exercise more than anything, as there are duplicate file
finders out in the world already.
There are several problems with what I have put together so far, which
this this:
Get-ChildItem c:\stuff -Recurse | select length, fullname |
export-csv -NoTypeInformation c:\temp\files.csv
Import-CSV C:\temp\files.csv | ForEach-Object { (get-filehash
-algorithm md5 $_.FullName) }; Length | Sort hash
Using Length (or $_.Length) anywhere in the foreach statement gives an
error, or gives weird output.
Sample Output when not using Length, and therefore getting reasonable
output (extra spaces and hyphen delimiters elided):
Algorithm Hash
Path
MD5 592BE1AD0ED83C36D5E68CA7A014A510 C:\stuff\Tools\SomeFile.DOC
What I'd like to see instead
Hash Length Path
592BE1AD0ED83C36D5E68CA7A014A510 79872 C:\stuff\Tools\SomeFile.DOC
If anyone can offer some instruction, I'd appreciate it.
Kurt
================================================
Did you know you can also post and find answers on PowerShell in the forums?
http://www.myitforum.com/forums/default.asp?catApp=1