Re: [powershell] Re: Need some pointers on an exercise I've set for myself

Kurt Buff Thu, 06 Aug 2015 17:13:34 -0700

OK - with that hint, I've solved that problem. Script has been updated
to prompt for the directory with read-host and set a variable.


We'll see if that fixes the problem with missing hashes

Kurt

On Thu, Aug 6, 2015 at 3:16 PM, Michael B. Smith <[email protected]> wrote:
> You don't use $_ you use $input.
>
> -----Original Message-----
> From: [email protected] [mailto:[email protected]] 
> On Behalf Of Kurt Buff
> Sent: Thursday, August 6, 2015 6:07 PM
> To: [email protected]
> Subject: Re: [powershell] Re: Need some pointers on an exercise I've set for 
> myself
>
> Well, fine then! Don't execute scripts from the ISE... :)
>
> But, I've saved it to a .ps1 file, and am trying to run it in the
> regular shell, and am not seeing my expected results.
>
> Named DupeFileFinder.ps1, I execute it like so:
>
> c:\Batchfiles>"G:\Groups\Information Technology" | .\DupeFileFinder.ps1
> or
> c:\Batchfiles>"G:\Groups\Information Technology" |
> c:\batchfiles\dupfilefinder.ps1
>
> and get output regarding the files in c:\batchfiles, not about
> "g:\groups\information technology"
>
> The script currently looks like this (and as I get this polished up,
> I'll configure it to accept directories as a parameter - haven't
> gotten that far yet):
>
> ----------Begin DupeFileFinder.csv----------
> # Generate file.csv
> Get-ChildItem $_ -File -Recurse | select length, fullname | export-csv
> -NoTypeInformation c:\temp\fileList.csv
>
> # Generate filesWithHash.csv
> Import-CSV C:\temp\fileList.csv | Select-Object -Property
> @{Name="Hash";Expression={(get-filehash -algorithm md5 -literalPath
> $_.FullName).Hash}},Length,FullName | export-csv -NoTypeInformation
> c:\temp\fileListHashed.csv
>
> # Sort files ascending by Hash
> Import-CSV C:\temp\fileListHashed.csv | Sort-Object Hash | export-csv
> -NoTypeInformation c:\temp\FileListHashedSortedOnHash.csv
>
> # Extract non-unique files from the list
> Import-Csv C:\temp\FileListHashedSortedOnHash.csv | Group-Object
> -property Hash | Where-Object { $_.count -gt 1 } | Select -Expand
> Group | Export-Csv -NoTypeInformation c:\temp\fileDupesWithHash.csv
> ----------End DupeFileFinder.csv----------
>
> On Thu, Aug 6, 2015 at 2:19 PM, Michael B. Smith <[email protected]> 
> wrote:
>> I and U are beside each other. J
>>
>>
>>
>> Don’t use ISE.
>>
>>
>>
>> From: [email protected] [mailto:[email protected]]
>> On Behalf Of Webster
>> Sent: Thursday, August 6, 2015 4:17 PM
>>
>>
>> To: [email protected]
>> Subject: Re: [powershell] Re: Need some pointers on an exercise I've set for
>> myself
>>
>>
>>
>> Don't use use?
>>
>>
>>
>>
>>
>> Carl Webster
>>
>> Consultant and Citrix Technology Professional
>>
>> http://www.CarlWebster.com
>>
>>
>>
>> ________________________________
>>
>> From: [email protected] <[email protected]> on
>> behalf of Michael B. Smith <[email protected]>
>> Sent: Thursday, August 6, 2015 3:15 PM
>> To: [email protected]
>> Subject: RE: [powershell] Re: Need some pointers on an exercise I've set for
>> myself
>>
>>
>>
>> Don't use use. :-)
>>
>> Sent from my Windows Phone
>>
>> ________________________________
>>
>> From: Kurt Buff
>> Sent: ‎8/‎6/‎2015 1:09 PM
>> To: [email protected]
>> Subject: Re: [powershell] Re: Need some pointers on an exercise I've set for
>> myself
>>
>> Sorry, yes, when I said I ran it manually, I meant that I ran it from
>> the normal shell, not from the ISE.
>>
>> Kurt
>>
>> On Thu, Aug 6, 2015 at 12:50 PM, Michael B. Smith <[email protected]>
>> wrote:
>>> Do you get different behavior running it from the normal shell?
>>>
>>> -----Original Message-----
>>> From: [email protected]
>>> [mailto:[email protected]] On Behalf Of Kurt Buff
>>> Sent: Thursday, August 6, 2015 2:20 PM
>>> To: [email protected]
>>> Subject: [powershell] Re: Need some pointers on an exercise I've set for
>>> myself
>>>
>>> Getting much closer...
>>>
>>> When running this line of code:
>>>
>>> Import-CSV C:\temp\IT-files.csv | Select-Object -Property
>>> @{Name="Hash";Expression={(get-filehash -algorithm md5 -literalPath
>>> $_.FullName).Hash}},Length,FullName | export-csv -NoTypeInformation
>>> c:\temp\IT-filehash.csv
>>>
>>> I get 18 files that don't get a hash (out of 22,727 files, so I'm not
>>> hugely fussed about it). So, out of curiosity, I ran get-filehash against
>>> them manually, that is, not as an entry in a CSV file.
>>>
>>> For one of them, I've identified why - someone has it open for writing,
>>> which once I think about it is not unexpected
>>>
>>> But, I'm not seeing error output in the ISE for that file, and for the
>>> rest, which is a bit strange, and for the files that aren't opened, and I
>>> manually do a get-filehash against them, I get a hash just fine.
>>>
>>> So, for grins, I ran it again from the ISE, against a CSV file containing
>>> only the headers and the list of files that didn't hash originally, I
>>> *still* don't get a hash, or an error code for the file that's open for
>>> write. The files that don't get a hash are just PDF and DOC files.
>>>
>>> Anyone run into anything like this?
>>>
>>> On Mon, Aug 3, 2015 at 5:25 PM, Kurt Buff <[email protected]> wrote:
>>>> Replying to myself, since that seems the reasonable thing to do here.
>>>>
>>>> I've tested the following against a smaller directory that I know has
>>>> some duplicates, and am getting progress. Here is what I have so far
>>>> (work with the line wraps!):
>>>>
>>>> Get-ChildItem S:\ -File -Recurse | select fullname, length |
>>>> Export-CSV -NoTypeInformation c:\temp\files.csv
>>>>
>>>> Import-CSV c:\temp\files.csv | Select-Object -Property
>>>> @{Name="MD5";Expression={(Get-Filehash -algorithm md5
>>>> $_.FullName).MD5}},Length,FullName | Export-CSV -NoTypeInformation
>>>> c:\temp\filehash.csv
>>>>
>>>> Import-CSV C:\temp\checker\fileMD5.csv | Sort-Object
>>>> @{Expression={$_.Length -as [int]}} | Export-CSV -NoTypeInformation
>>>> c:\temp\checker\FileMD5Sorted.csv
>>>>
>>>> The above generates a file of 315286 lines (not including header) - of
>>>> course, that's the number of files in the directory tree. I get output
>>>> that looks like this (work with the line wraps again):
>>>>
>>>> "MD5","Length","FullName"
>>>>
>>>> "6467C3875955DF4514395F0AFCAAA62A","3182604288","S:\Infrastructure\Microsoft\OSes\Win7EntSP1_64bit\SW_DVD5_SA_Win_Ent_7w_SP1_64BIT_English_-2_MLF_X17-58882.ISO"
>>>>
>>>> I noticed two oddities, however:
>>>>
>>>> o- zero-length files generate a hash, and of course the hash is the
>>>> same for all of them. I probably should have expected that, but it
>>>> surprised me.
>>>>
>>>> o- I find a handful of files (22 of them) at the top of the csv file
>>>> after sorting that don't seem to obey the sorting on the hash that the
>>>> other files followed. It's very strange. They're not duplicates of any
>>>> other files; their hashes and file sizes are out of sort order from
>>>> all of the rest, AFAICT. I'm not sure what to make of that.
>>>>
>>>> But, ignoring those two things, I'd like to proceed a bit further:
>>>>
>>>> o- Writing to another file only those lines that are duplicate files,
>>>> which I can do by selecting selecting the lines that have matching
>>>> hashes (and possibly also matching sizes)
>>>>
>>>> o- Possibly adding another column, which would contain an integer that
>>>> would increment for each set of matched files, which would probably
>>>> lead to...
>>>>
>>>> o- Among other things, calculating the amount of duplicated space (sum
>>>> of n-1 file sizes for each set of dupes), identifying duplicate
>>>> directories that can be eliminated in toto, etc.
>>>>
>>>> But, I'm stymied on the execution of the logic. I'm such an
>>>> inexperienced programmer that I'm flailing on the first of these
>>>> steps. I believe I need to make a stepwise comparison of the MD5
>>>> column, which I think would look something like this:
>>>>
>>>>      $dupe = 1
>>>>      read infile.line1 into variable1
>>>>      read infile.line2 into variable2
>>>>      if {
>>>>           variable1.MD5 -eq variable2.MD5
>>>>           prefix variable1 with dupe counter
>>>>           write variable1 to the new csv file
>>>>           while not eof
>>>>                set variable1 to the contents of variable2
>>>>                read line next into variable2
>>>>                compare variable1.MD5 to variable2.MD5
>>>>           if match
>>>>                prefix variable1 with $dupe
>>>>                append variable1 as new line of new csv file
>>>>           else
>>>>                increment dupe counter
>>>>      endwhile }
>>>>     else {
>>>>           while not eof
>>>>                set variable1 to the contents of variable2
>>>>                read line next into variable2
>>>>                compare variable1.MD5 to variable2.MD5
>>>>           if match
>>>>                prefix variable1 with $dupe
>>>>                append variable1 as new line of new csv file
>>>>           else
>>>>                increment dupe counter
>>>>      endwhile
>>>>
>>>> I realize I could be way off base on the algorithm here, but that's
>>>> what I've been able to dream up.
>>>>
>>>> Anyone care to critique and offer syntax suggestions - my googlefu is
>>>> about exhausted.
>>>>
>>>> Kurt
>>>>
>>>> On Thu, Jul 30, 2015 at 12:45 PM, Kurt Buff <[email protected]> wrote:
>>>>> I'm putting together what should be a simple little script, and failing.
>>>>>
>>>>> I am ultimately looking to run this against a directory, then sort
>>>>> the output on the hash field and then parse for duplicates. There are
>>>>> two conditions that concern me: 1) there are over 3m files in the
>>>>> target directory, and 2) many of the files are quite large, over 1g.
>>>>>
>>>>> I'm more concerned about the effects of the script on memory than on
>>>>> processor - the data is fairly static, and I intend to run it once a
>>>>> month or even less, but I did choose MD5 as the hash algorithm for
>>>>> speed, rather than accept the default of SHA256.
>>>>>
>>>>> This is pretty simple stuff, I'm sure, but I'm using this as a
>>>>> learning exercise more than anything, as there are duplicate file
>>>>> finders out in the world already.
>>>>>
>>>>> There are several problems with what I have put together so far,
>>>>> which this this:
>>>>>
>>>>>      Get-ChildItem c:\stuff -Recurse | select length, fullname |
>>>>> export-csv -NoTypeInformation c:\temp\files.csv
>>>>>      Import-CSV C:\temp\files.csv | ForEach-Object { (get-filehash
>>>>> -algorithm md5 $_.FullName) }; Length | Sort hash
>>>>>
>>>>> Using Length (or $_.Length) anywhere in the foreach statement gives
>>>>> an error, or gives weird output.
>>>>>
>>>>> Sample Output when not using Length, and therefore getting reasonable
>>>>> output (extra spaces and hyphen delimiters elided):
>>>>>      Algorithm   Hash
>>>>>         Path
>>>>>      MD5          592BE1AD0ED83C36D5E68CA7A014A510
>>>>> C:\stuff\Tools\SomeFile.DOC
>>>>>
>>>>> What I'd like to see instead
>>>>>      Hash
>>>>> Length   Path
>>>>>      592BE1AD0ED83C36D5E68CA7A014A510    79872
>>>>> C:\stuff\Tools\SomeFile.DOC
>>>>>
>>>>> If anyone can offer some instruction, I'd appreciate it.
>>>>>
>>>>> Kurt
>>>
>>>
>>> ================================================
>>> Did you know you can also post and find answers on PowerShell in the
>>> forums?
>>> http://www.myitforum.com/forums/default.asp?catApp=1
>>>
>>>
>>> ================================================
>>> Did you know you can also post and find answers on PowerShell in the
>>> forums?
>>> http://www.myitforum.com/forums/default.asp?catApp=1
>>
>>
>> ================================================
>> Did you know you can also post and find answers on PowerShell in the forums?
>> http://www.myitforum.com/forums/default.asp?catApp=1
>>
>>
>> ================================================
>> Did you know you can also post and find answers on PowerShell in the forums?
>> http://www.myitforum.com/forums/default.asp?catApp=1
>>
>>
>> ================================================
>> Did you know you can also post and find answers on PowerShell in the forums?
>> http://www.myitforum.com/forums/default.asp?catApp=1
>>
>>
>> ================================================
>> Did you know you can also post and find answers on PowerShell in the forums?
>> http://www.myitforum.com/forums/default.asp?catApp=1
>
>
> ================================================
> Did you know you can also post and find answers on PowerShell in the forums?
> http://www.myitforum.com/forums/default.asp?catApp=1
>
>
> ================================================
> Did you know you can also post and find answers on PowerShell in the forums?
> http://www.myitforum.com/forums/default.asp?catApp=1


================================================
Did you know you can also post and find answers on PowerShell in the forums?
http://www.myitforum.com/forums/default.asp?catApp=1

Re: [powershell] Re: Need some pointers on an exercise I've set for myself

Reply via email to