Don't use use. :-)

Sent from my Windows Phone
________________________________
From: Kurt Buff<mailto:[email protected]>
Sent: ‎8/‎6/‎2015 1:09 PM
To: [email protected]<mailto:[email protected]>
Subject: Re: [powershell] Re: Need some pointers on an exercise I've set for 
myself

Sorry, yes, when I said I ran it manually, I meant that I ran it from
the normal shell, not from the ISE.

Kurt

On Thu, Aug 6, 2015 at 12:50 PM, Michael B. Smith <[email protected]> wrote:
> Do you get different behavior running it from the normal shell?
>
> -----Original Message-----
> From: [email protected] [mailto:[email protected]] 
> On Behalf Of Kurt Buff
> Sent: Thursday, August 6, 2015 2:20 PM
> To: [email protected]
> Subject: [powershell] Re: Need some pointers on an exercise I've set for 
> myself
>
> Getting much closer...
>
> When running this line of code:
>
> Import-CSV C:\temp\IT-files.csv | Select-Object -Property 
> @{Name="Hash";Expression={(get-filehash -algorithm md5 -literalPath 
> $_.FullName).Hash}},Length,FullName | export-csv -NoTypeInformation 
> c:\temp\IT-filehash.csv
>
> I get 18 files that don't get a hash (out of 22,727 files, so I'm not hugely 
> fussed about it). So, out of curiosity, I ran get-filehash against them 
> manually, that is, not as an entry in a CSV file.
>
> For one of them, I've identified why - someone has it open for writing, which 
> once I think about it is not unexpected
>
> But, I'm not seeing error output in the ISE for that file, and for the rest, 
> which is a bit strange, and for the files that aren't opened, and I manually 
> do a get-filehash against them, I get a hash just fine.
>
> So, for grins, I ran it again from the ISE, against a CSV file containing 
> only the headers and the list of files that didn't hash originally, I *still* 
> don't get a hash, or an error code for the file that's open for write. The 
> files that don't get a hash are just PDF and DOC files.
>
> Anyone run into anything like this?
>
> On Mon, Aug 3, 2015 at 5:25 PM, Kurt Buff <[email protected]> wrote:
>> Replying to myself, since that seems the reasonable thing to do here.
>>
>> I've tested the following against a smaller directory that I know has
>> some duplicates, and am getting progress. Here is what I have so far
>> (work with the line wraps!):
>>
>> Get-ChildItem S:\ -File -Recurse | select fullname, length |
>> Export-CSV -NoTypeInformation c:\temp\files.csv
>>
>> Import-CSV c:\temp\files.csv | Select-Object -Property
>> @{Name="MD5";Expression={(Get-Filehash -algorithm md5
>> $_.FullName).MD5}},Length,FullName | Export-CSV -NoTypeInformation
>> c:\temp\filehash.csv
>>
>> Import-CSV C:\temp\checker\fileMD5.csv | Sort-Object
>> @{Expression={$_.Length -as [int]}} | Export-CSV -NoTypeInformation
>> c:\temp\checker\FileMD5Sorted.csv
>>
>> The above generates a file of 315286 lines (not including header) - of
>> course, that's the number of files in the directory tree. I get output
>> that looks like this (work with the line wraps again):
>>
>> "MD5","Length","FullName"
>> "6467C3875955DF4514395F0AFCAAA62A","3182604288","S:\Infrastructure\Microsoft\OSes\Win7EntSP1_64bit\SW_DVD5_SA_Win_Ent_7w_SP1_64BIT_English_-2_MLF_X17-58882.ISO"
>>
>> I noticed two oddities, however:
>>
>> o- zero-length files generate a hash, and of course the hash is the
>> same for all of them. I probably should have expected that, but it
>> surprised me.
>>
>> o- I find a handful of files (22 of them) at the top of the csv file
>> after sorting that don't seem to obey the sorting on the hash that the
>> other files followed. It's very strange. They're not duplicates of any
>> other files; their hashes and file sizes are out of sort order from
>> all of the rest, AFAICT. I'm not sure what to make of that.
>>
>> But, ignoring those two things, I'd like to proceed a bit further:
>>
>> o- Writing to another file only those lines that are duplicate files,
>> which I can do by selecting selecting the lines that have matching
>> hashes (and possibly also matching sizes)
>>
>> o- Possibly adding another column, which would contain an integer that
>> would increment for each set of matched files, which would probably
>> lead to...
>>
>> o- Among other things, calculating the amount of duplicated space (sum
>> of n-1 file sizes for each set of dupes), identifying duplicate
>> directories that can be eliminated in toto, etc.
>>
>> But, I'm stymied on the execution of the logic. I'm such an
>> inexperienced programmer that I'm flailing on the first of these
>> steps. I believe I need to make a stepwise comparison of the MD5
>> column, which I think would look something like this:
>>
>>      $dupe = 1
>>      read infile.line1 into variable1
>>      read infile.line2 into variable2
>>      if {
>>           variable1.MD5 -eq variable2.MD5
>>           prefix variable1 with dupe counter
>>           write variable1 to the new csv file
>>           while not eof
>>                set variable1 to the contents of variable2
>>                read line next into variable2
>>                compare variable1.MD5 to variable2.MD5
>>           if match
>>                prefix variable1 with $dupe
>>                append variable1 as new line of new csv file
>>           else
>>                increment dupe counter
>>      endwhile }
>>     else {
>>           while not eof
>>                set variable1 to the contents of variable2
>>                read line next into variable2
>>                compare variable1.MD5 to variable2.MD5
>>           if match
>>                prefix variable1 with $dupe
>>                append variable1 as new line of new csv file
>>           else
>>                increment dupe counter
>>      endwhile
>>
>> I realize I could be way off base on the algorithm here, but that's
>> what I've been able to dream up.
>>
>> Anyone care to critique and offer syntax suggestions - my googlefu is
>> about exhausted.
>>
>> Kurt
>>
>> On Thu, Jul 30, 2015 at 12:45 PM, Kurt Buff <[email protected]> wrote:
>>> I'm putting together what should be a simple little script, and failing.
>>>
>>> I am ultimately looking to run this against a directory, then sort
>>> the output on the hash field and then parse for duplicates. There are
>>> two conditions that concern me: 1) there are over 3m files in the
>>> target directory, and 2) many of the files are quite large, over 1g.
>>>
>>> I'm more concerned about the effects of the script on memory than on
>>> processor - the data is fairly static, and I intend to run it once a
>>> month or even less, but I did choose MD5 as the hash algorithm for
>>> speed, rather than accept the default of SHA256.
>>>
>>> This is pretty simple stuff, I'm sure, but I'm using this as a
>>> learning exercise more than anything, as there are duplicate file
>>> finders out in the world already.
>>>
>>> There are several problems with what I have put together so far,
>>> which this this:
>>>
>>>      Get-ChildItem c:\stuff -Recurse | select length, fullname |
>>> export-csv -NoTypeInformation c:\temp\files.csv
>>>      Import-CSV C:\temp\files.csv | ForEach-Object { (get-filehash
>>> -algorithm md5 $_.FullName) }; Length | Sort hash
>>>
>>> Using Length (or $_.Length) anywhere in the foreach statement gives
>>> an error, or gives weird output.
>>>
>>> Sample Output when not using Length, and therefore getting reasonable
>>> output (extra spaces and hyphen delimiters elided):
>>>      Algorithm   Hash
>>>         Path
>>>      MD5          592BE1AD0ED83C36D5E68CA7A014A510   
>>> C:\stuff\Tools\SomeFile.DOC
>>>
>>> What I'd like to see instead
>>>      Hash                                                          Length   
>>> Path
>>>      592BE1AD0ED83C36D5E68CA7A014A510    79872    
>>> C:\stuff\Tools\SomeFile.DOC
>>>
>>> If anyone can offer some instruction, I'd appreciate it.
>>>
>>> Kurt
>
>
> ================================================
> Did you know you can also post and find answers on PowerShell in the forums?
> http://www.myitforum.com/forums/default.asp?catApp=1
>
>
> ================================================
> Did you know you can also post and find answers on PowerShell in the forums?
> http://www.myitforum.com/forums/default.asp?catApp=1


================================================
Did you know you can also post and find answers on PowerShell in the forums?
http://www.myitforum.com/forums/default.asp?catApp=1



================================================
Did you know you can also post and find answers on PowerShell in the forums?
http://www.myitforum.com/forums/default.asp?catApp=1

Reply via email to