In the sample code that you provided, you're only outputting the result of
Get-FileHash to the pipeline but it sounds like you want to add it to what's in
the Import-CSV objects. Here are 3 methods of doing that:
Get-ChildItem . -Recurse | select length, fullname | export-csv
-NoTypeInformation $env:TEMP\files.csv
# Method 1
# Add-Member with -PassThru
Import-CSV $env:TEMP\files.csv | ForEach-Object {
$_ | Add-Member -MemberType NoteProperty -Name Hash -Value (get-filehash
-algorithm md5 $_.FullName).Hash -PassThru
} | Sort hash
# Method 2
# Create a PSCustomObject with hash table
Import-CSV $env:TEMP\files.csv | ForEach-Object {
[pscustomObject] @{
Hash = (get-filehash -algorithm md5 $_.FullName).Hash
Legth = $_.length
Path = $_.FullName
}
} | Sort hash
# Method 3
# Select-Object with Name/Expression hash table as a Property parameter
Import-CSV $env:TEMP\files.csv | Select-Object -Property
@{Name="Hash";Expression={(get-filehash -algorithm md5
$_.FullName).Hash}},Length,FullName | Sort hash
-----Original Message-----
From: [email protected] [mailto:[email protected]] On
Behalf Of Kurt Buff
Sent: Thursday, July 30, 2015 5:09 PM
To: [email protected]
Subject: Re: [powershell] Need some pointers on an exercise I've set for myself
File store approaches 3tb now - just about 290gb free on a 3.1tb partition.
The concern is that I've noticed a fair number of ISO files (and potentially a
lot of other files including zip and other archives, and mpegs, etc.) that seem
to be duplicates of each other.
I want to generate a report for the VP of engineering, and let him know how bad
the situation is - I'm going to guess there's close to 1tb of redundancy
currently.
Yes, this will consume hours of time, but I can launch it over a weekend and
take a look on the Monday following.
I like your idea of restartability, though - it's worth looking at as a
secondary goal.
Kurt
On Thu, Jul 30, 2015 at 2:19 PM, James Button <[email protected]>
wrote:
> Not experienced enough in powershell to suggest code, BUT I would
> advise that you make the process run as a restartable facility such that the
> process can be interrupted ( if not by escape or ctrl+c, then by task
> killing) and then, when restarted will continue processing a list of files
> from the one after the last one for which a result was recorded.
>
> Working on the basis that you have a 1TB file store, and are working towards
> a 3, or 6TB filestore, even assuming your filestore connection runs at
> 8Gb/sec, as in 60GB per minute, that's surely going to be an hours full time
> use of the interface, and I'd really expect the hashing process to take
> getting on for a day elapsed if the system is running - spinning media on a
> more common interface connection, rather than a solid state store on the
> fastest possible multi-channel interface.
>
> You may also need to consider the system overhead in assembling the
> list of files - sheer volume of the MFT to be processed, I know from
> a fair amount of the restructuring work I used to do for clients on a 4GB
> memory system with caddy'd drives - Such as renaming files that filled a 1TB
> drive, for access as 'home drives' - before you had all the maintenance
> goodies in the admin facilities.
>
> (Having taken a complete list of files, stuck them in Excel, sorted
> them there, and generated a set of rename commands.)
>
> It took more time processing the MFT entries to "rename" the files in situ -
> than it did to copy them to another drive with the new names.
> Simply because of the thrashing on the MFT blocks in the OS allocated disk
> read cache.
>
> JimB
>
>
> -----Original Message-----
> From: [email protected]
> [mailto:[email protected]] On Behalf Of Kurt Buff
> Sent: Thursday, July 30, 2015 8:45 PM
> To: [email protected]
> Subject: [powershell] Need some pointers on an exercise I've set for
> myself
>
> I'm putting together what should be a simple little script, and failing.
>
> I am ultimately looking to run this against a directory, then sort the
> output on the hash field and then parse for duplicates. There are two
> conditions that concern me: 1) there are over 3m files in the target
> directory, and 2) many of the files are quite large, over 1g.
>
> I'm more concerned about the effects of the script on memory than on
> processor - the data is fairly static, and I intend to run it once a
> month or even less, but I did choose MD5 as the hash algorithm for
> speed, rather than accept the default of SHA256.
>
> This is pretty simple stuff, I'm sure, but I'm using this as a
> learning exercise more than anything, as there are duplicate file
> finders out in the world already.
>
> There are several problems with what I have put together so far, which
> this this:
>
> Get-ChildItem c:\stuff -Recurse | select length, fullname |
> export-csv -NoTypeInformation c:\temp\files.csv
> Import-CSV C:\temp\files.csv | ForEach-Object { (get-filehash
> -algorithm md5 $_.FullName) }; Length | Sort hash
>
> Using Length (or $_.Length) anywhere in the foreach statement gives an
> error, or gives weird output.
>
> Sample Output when not using Length, and therefore getting reasonable
> output (extra spaces and hyphen delimiters elided):
> Algorithm Hash
> Path
> MD5 592BE1AD0ED83C36D5E68CA7A014A510
> C:\stuff\Tools\SomeFile.DOC
>
> What I'd like to see instead
> Hash Length
> Path
> 592BE1AD0ED83C36D5E68CA7A014A510 79872 C:\stuff\Tools\SomeFile.DOC
>
> If anyone can offer some instruction, I'd appreciate it.
>
> Kurt
>
>
> ================================================
> Did you know you can also post and find answers on PowerShell in the forums?
> http://www.myitforum.com/forums/default.asp?catApp=1
>
>
>
> ================================================
> Did you know you can also post and find answers on PowerShell in the forums?
> http://www.myitforum.com/forums/default.asp?catApp=1
>
================================================
Did you know you can also post and find answers on PowerShell in the forums?
http://www.myitforum.com/forums/default.asp?catApp=1
================================================
Did you know you can also post and find answers on PowerShell in the forums?
http://www.myitforum.com/forums/default.asp?catApp=1