OK - with that hint, I've solved that problem. Script has been updated to prompt for the directory with read-host and set a variable.
We'll see if that fixes the problem with missing hashes Kurt On Thu, Aug 6, 2015 at 3:16 PM, Michael B. Smith <[email protected]> wrote: > You don't use $_ you use $input. > > -----Original Message----- > From: [email protected] [mailto:[email protected]] > On Behalf Of Kurt Buff > Sent: Thursday, August 6, 2015 6:07 PM > To: [email protected] > Subject: Re: [powershell] Re: Need some pointers on an exercise I've set for > myself > > Well, fine then! Don't execute scripts from the ISE... :) > > But, I've saved it to a .ps1 file, and am trying to run it in the > regular shell, and am not seeing my expected results. > > Named DupeFileFinder.ps1, I execute it like so: > > c:\Batchfiles>"G:\Groups\Information Technology" | .\DupeFileFinder.ps1 > or > c:\Batchfiles>"G:\Groups\Information Technology" | > c:\batchfiles\dupfilefinder.ps1 > > and get output regarding the files in c:\batchfiles, not about > "g:\groups\information technology" > > The script currently looks like this (and as I get this polished up, > I'll configure it to accept directories as a parameter - haven't > gotten that far yet): > > ----------Begin DupeFileFinder.csv---------- > # Generate file.csv > Get-ChildItem $_ -File -Recurse | select length, fullname | export-csv > -NoTypeInformation c:\temp\fileList.csv > > # Generate filesWithHash.csv > Import-CSV C:\temp\fileList.csv | Select-Object -Property > @{Name="Hash";Expression={(get-filehash -algorithm md5 -literalPath > $_.FullName).Hash}},Length,FullName | export-csv -NoTypeInformation > c:\temp\fileListHashed.csv > > # Sort files ascending by Hash > Import-CSV C:\temp\fileListHashed.csv | Sort-Object Hash | export-csv > -NoTypeInformation c:\temp\FileListHashedSortedOnHash.csv > > # Extract non-unique files from the list > Import-Csv C:\temp\FileListHashedSortedOnHash.csv | Group-Object > -property Hash | Where-Object { $_.count -gt 1 } | Select -Expand > Group | Export-Csv -NoTypeInformation c:\temp\fileDupesWithHash.csv > ----------End DupeFileFinder.csv---------- > > On Thu, Aug 6, 2015 at 2:19 PM, Michael B. Smith <[email protected]> > wrote: >> I and U are beside each other. J >> >> >> >> Don’t use ISE. >> >> >> >> From: [email protected] [mailto:[email protected]] >> On Behalf Of Webster >> Sent: Thursday, August 6, 2015 4:17 PM >> >> >> To: [email protected] >> Subject: Re: [powershell] Re: Need some pointers on an exercise I've set for >> myself >> >> >> >> Don't use use? >> >> >> >> >> >> Carl Webster >> >> Consultant and Citrix Technology Professional >> >> http://www.CarlWebster.com >> >> >> >> ________________________________ >> >> From: [email protected] <[email protected]> on >> behalf of Michael B. Smith <[email protected]> >> Sent: Thursday, August 6, 2015 3:15 PM >> To: [email protected] >> Subject: RE: [powershell] Re: Need some pointers on an exercise I've set for >> myself >> >> >> >> Don't use use. :-) >> >> Sent from my Windows Phone >> >> ________________________________ >> >> From: Kurt Buff >> Sent: 8/6/2015 1:09 PM >> To: [email protected] >> Subject: Re: [powershell] Re: Need some pointers on an exercise I've set for >> myself >> >> Sorry, yes, when I said I ran it manually, I meant that I ran it from >> the normal shell, not from the ISE. >> >> Kurt >> >> On Thu, Aug 6, 2015 at 12:50 PM, Michael B. Smith <[email protected]> >> wrote: >>> Do you get different behavior running it from the normal shell? >>> >>> -----Original Message----- >>> From: [email protected] >>> [mailto:[email protected]] On Behalf Of Kurt Buff >>> Sent: Thursday, August 6, 2015 2:20 PM >>> To: [email protected] >>> Subject: [powershell] Re: Need some pointers on an exercise I've set for >>> myself >>> >>> Getting much closer... >>> >>> When running this line of code: >>> >>> Import-CSV C:\temp\IT-files.csv | Select-Object -Property >>> @{Name="Hash";Expression={(get-filehash -algorithm md5 -literalPath >>> $_.FullName).Hash}},Length,FullName | export-csv -NoTypeInformation >>> c:\temp\IT-filehash.csv >>> >>> I get 18 files that don't get a hash (out of 22,727 files, so I'm not >>> hugely fussed about it). So, out of curiosity, I ran get-filehash against >>> them manually, that is, not as an entry in a CSV file. >>> >>> For one of them, I've identified why - someone has it open for writing, >>> which once I think about it is not unexpected >>> >>> But, I'm not seeing error output in the ISE for that file, and for the >>> rest, which is a bit strange, and for the files that aren't opened, and I >>> manually do a get-filehash against them, I get a hash just fine. >>> >>> So, for grins, I ran it again from the ISE, against a CSV file containing >>> only the headers and the list of files that didn't hash originally, I >>> *still* don't get a hash, or an error code for the file that's open for >>> write. The files that don't get a hash are just PDF and DOC files. >>> >>> Anyone run into anything like this? >>> >>> On Mon, Aug 3, 2015 at 5:25 PM, Kurt Buff <[email protected]> wrote: >>>> Replying to myself, since that seems the reasonable thing to do here. >>>> >>>> I've tested the following against a smaller directory that I know has >>>> some duplicates, and am getting progress. Here is what I have so far >>>> (work with the line wraps!): >>>> >>>> Get-ChildItem S:\ -File -Recurse | select fullname, length | >>>> Export-CSV -NoTypeInformation c:\temp\files.csv >>>> >>>> Import-CSV c:\temp\files.csv | Select-Object -Property >>>> @{Name="MD5";Expression={(Get-Filehash -algorithm md5 >>>> $_.FullName).MD5}},Length,FullName | Export-CSV -NoTypeInformation >>>> c:\temp\filehash.csv >>>> >>>> Import-CSV C:\temp\checker\fileMD5.csv | Sort-Object >>>> @{Expression={$_.Length -as [int]}} | Export-CSV -NoTypeInformation >>>> c:\temp\checker\FileMD5Sorted.csv >>>> >>>> The above generates a file of 315286 lines (not including header) - of >>>> course, that's the number of files in the directory tree. I get output >>>> that looks like this (work with the line wraps again): >>>> >>>> "MD5","Length","FullName" >>>> >>>> "6467C3875955DF4514395F0AFCAAA62A","3182604288","S:\Infrastructure\Microsoft\OSes\Win7EntSP1_64bit\SW_DVD5_SA_Win_Ent_7w_SP1_64BIT_English_-2_MLF_X17-58882.ISO" >>>> >>>> I noticed two oddities, however: >>>> >>>> o- zero-length files generate a hash, and of course the hash is the >>>> same for all of them. I probably should have expected that, but it >>>> surprised me. >>>> >>>> o- I find a handful of files (22 of them) at the top of the csv file >>>> after sorting that don't seem to obey the sorting on the hash that the >>>> other files followed. It's very strange. They're not duplicates of any >>>> other files; their hashes and file sizes are out of sort order from >>>> all of the rest, AFAICT. I'm not sure what to make of that. >>>> >>>> But, ignoring those two things, I'd like to proceed a bit further: >>>> >>>> o- Writing to another file only those lines that are duplicate files, >>>> which I can do by selecting selecting the lines that have matching >>>> hashes (and possibly also matching sizes) >>>> >>>> o- Possibly adding another column, which would contain an integer that >>>> would increment for each set of matched files, which would probably >>>> lead to... >>>> >>>> o- Among other things, calculating the amount of duplicated space (sum >>>> of n-1 file sizes for each set of dupes), identifying duplicate >>>> directories that can be eliminated in toto, etc. >>>> >>>> But, I'm stymied on the execution of the logic. I'm such an >>>> inexperienced programmer that I'm flailing on the first of these >>>> steps. I believe I need to make a stepwise comparison of the MD5 >>>> column, which I think would look something like this: >>>> >>>> $dupe = 1 >>>> read infile.line1 into variable1 >>>> read infile.line2 into variable2 >>>> if { >>>> variable1.MD5 -eq variable2.MD5 >>>> prefix variable1 with dupe counter >>>> write variable1 to the new csv file >>>> while not eof >>>> set variable1 to the contents of variable2 >>>> read line next into variable2 >>>> compare variable1.MD5 to variable2.MD5 >>>> if match >>>> prefix variable1 with $dupe >>>> append variable1 as new line of new csv file >>>> else >>>> increment dupe counter >>>> endwhile } >>>> else { >>>> while not eof >>>> set variable1 to the contents of variable2 >>>> read line next into variable2 >>>> compare variable1.MD5 to variable2.MD5 >>>> if match >>>> prefix variable1 with $dupe >>>> append variable1 as new line of new csv file >>>> else >>>> increment dupe counter >>>> endwhile >>>> >>>> I realize I could be way off base on the algorithm here, but that's >>>> what I've been able to dream up. >>>> >>>> Anyone care to critique and offer syntax suggestions - my googlefu is >>>> about exhausted. >>>> >>>> Kurt >>>> >>>> On Thu, Jul 30, 2015 at 12:45 PM, Kurt Buff <[email protected]> wrote: >>>>> I'm putting together what should be a simple little script, and failing. >>>>> >>>>> I am ultimately looking to run this against a directory, then sort >>>>> the output on the hash field and then parse for duplicates. There are >>>>> two conditions that concern me: 1) there are over 3m files in the >>>>> target directory, and 2) many of the files are quite large, over 1g. >>>>> >>>>> I'm more concerned about the effects of the script on memory than on >>>>> processor - the data is fairly static, and I intend to run it once a >>>>> month or even less, but I did choose MD5 as the hash algorithm for >>>>> speed, rather than accept the default of SHA256. >>>>> >>>>> This is pretty simple stuff, I'm sure, but I'm using this as a >>>>> learning exercise more than anything, as there are duplicate file >>>>> finders out in the world already. >>>>> >>>>> There are several problems with what I have put together so far, >>>>> which this this: >>>>> >>>>> Get-ChildItem c:\stuff -Recurse | select length, fullname | >>>>> export-csv -NoTypeInformation c:\temp\files.csv >>>>> Import-CSV C:\temp\files.csv | ForEach-Object { (get-filehash >>>>> -algorithm md5 $_.FullName) }; Length | Sort hash >>>>> >>>>> Using Length (or $_.Length) anywhere in the foreach statement gives >>>>> an error, or gives weird output. >>>>> >>>>> Sample Output when not using Length, and therefore getting reasonable >>>>> output (extra spaces and hyphen delimiters elided): >>>>> Algorithm Hash >>>>> Path >>>>> MD5 592BE1AD0ED83C36D5E68CA7A014A510 >>>>> C:\stuff\Tools\SomeFile.DOC >>>>> >>>>> What I'd like to see instead >>>>> Hash >>>>> Length Path >>>>> 592BE1AD0ED83C36D5E68CA7A014A510 79872 >>>>> C:\stuff\Tools\SomeFile.DOC >>>>> >>>>> If anyone can offer some instruction, I'd appreciate it. >>>>> >>>>> Kurt >>> >>> >>> ================================================ >>> Did you know you can also post and find answers on PowerShell in the >>> forums? >>> http://www.myitforum.com/forums/default.asp?catApp=1 >>> >>> >>> ================================================ >>> Did you know you can also post and find answers on PowerShell in the >>> forums? >>> http://www.myitforum.com/forums/default.asp?catApp=1 >> >> >> ================================================ >> Did you know you can also post and find answers on PowerShell in the forums? >> http://www.myitforum.com/forums/default.asp?catApp=1 >> >> >> ================================================ >> Did you know you can also post and find answers on PowerShell in the forums? >> http://www.myitforum.com/forums/default.asp?catApp=1 >> >> >> ================================================ >> Did you know you can also post and find answers on PowerShell in the forums? >> http://www.myitforum.com/forums/default.asp?catApp=1 >> >> >> ================================================ >> Did you know you can also post and find answers on PowerShell in the forums? >> http://www.myitforum.com/forums/default.asp?catApp=1 > > > ================================================ > Did you know you can also post and find answers on PowerShell in the forums? > http://www.myitforum.com/forums/default.asp?catApp=1 > > > ================================================ > Did you know you can also post and find answers on PowerShell in the forums? > http://www.myitforum.com/forums/default.asp?catApp=1 ================================================ Did you know you can also post and find answers on PowerShell in the forums? http://www.myitforum.com/forums/default.asp?catApp=1
