Out-freaking-standing. Now I can move on to the next task.
Thank you again. I'll be back... Kurt On Thu, Aug 6, 2015 at 5:55 PM, Michael B. Smith <[email protected]> wrote: > Add "-Encoding Unicode" to both Export-Csv and Import-Csv each time you use > them. > > -----Original Message----- > From: [email protected] [mailto:[email protected]] > On Behalf Of Kurt Buff > Sent: Thursday, August 6, 2015 8:42 PM > To: [email protected] > Subject: Re: [powershell] Re: Need some pointers on an exercise I've set for > myself > > Well, nuts. > > That didn't fix it. > > Oh, wait! > > I found something... > > The files that aren't getting hashed have question marks in the > file/directory name somewhere - but not always in the same place, it's mixed > between the file name and the directory name(s) - sometimes one, and > sometimes the other. That question mark is a translation of an e with an > accent grave. > > Anyone have thoughts on getting around that? > > Kurt > > On Thu, Aug 6, 2015 at 5:11 PM, Kurt Buff <[email protected]> wrote: >> OK - with that hint, I've solved that problem. Script has been updated >> to prompt for the directory with read-host and set a variable. >> >> We'll see if that fixes the problem with missing hashes >> >> Kurt >> >> On Thu, Aug 6, 2015 at 3:16 PM, Michael B. Smith <[email protected]> >> wrote: >>> You don't use $_ you use $input. >>> >>> -----Original Message----- >>> From: [email protected] >>> [mailto:[email protected]] On Behalf Of Kurt Buff >>> Sent: Thursday, August 6, 2015 6:07 PM >>> To: [email protected] >>> Subject: Re: [powershell] Re: Need some pointers on an exercise I've >>> set for myself >>> >>> Well, fine then! Don't execute scripts from the ISE... :) >>> >>> But, I've saved it to a .ps1 file, and am trying to run it in the >>> regular shell, and am not seeing my expected results. >>> >>> Named DupeFileFinder.ps1, I execute it like so: >>> >>> c:\Batchfiles>"G:\Groups\Information Technology" | >>> .\DupeFileFinder.ps1 or c:\Batchfiles>"G:\Groups\Information >>> Technology" | >>> c:\batchfiles\dupfilefinder.ps1 >>> >>> and get output regarding the files in c:\batchfiles, not about >>> "g:\groups\information technology" >>> >>> The script currently looks like this (and as I get this polished up, >>> I'll configure it to accept directories as a parameter - haven't >>> gotten that far yet): >>> >>> ----------Begin DupeFileFinder.csv---------- # Generate file.csv >>> Get-ChildItem $_ -File -Recurse | select length, fullname | >>> export-csv -NoTypeInformation c:\temp\fileList.csv >>> >>> # Generate filesWithHash.csv >>> Import-CSV C:\temp\fileList.csv | Select-Object -Property >>> @{Name="Hash";Expression={(get-filehash -algorithm md5 -literalPath >>> $_.FullName).Hash}},Length,FullName | export-csv -NoTypeInformation >>> c:\temp\fileListHashed.csv >>> >>> # Sort files ascending by Hash >>> Import-CSV C:\temp\fileListHashed.csv | Sort-Object Hash | export-csv >>> -NoTypeInformation c:\temp\FileListHashedSortedOnHash.csv >>> >>> # Extract non-unique files from the list Import-Csv >>> C:\temp\FileListHashedSortedOnHash.csv | Group-Object -property Hash >>> | Where-Object { $_.count -gt 1 } | Select -Expand Group | Export-Csv >>> -NoTypeInformation c:\temp\fileDupesWithHash.csv ----------End >>> DupeFileFinder.csv---------- >>> >>> On Thu, Aug 6, 2015 at 2:19 PM, Michael B. Smith <[email protected]> >>> wrote: >>>> I and U are beside each other. J >>>> >>>> >>>> >>>> Don’t use ISE. >>>> >>>> >>>> >>>> From: [email protected] >>>> [mailto:[email protected]] >>>> On Behalf Of Webster >>>> Sent: Thursday, August 6, 2015 4:17 PM >>>> >>>> >>>> To: [email protected] >>>> Subject: Re: [powershell] Re: Need some pointers on an exercise I've >>>> set for myself >>>> >>>> >>>> >>>> Don't use use? >>>> >>>> >>>> >>>> >>>> >>>> Carl Webster >>>> >>>> Consultant and Citrix Technology Professional >>>> >>>> http://www.CarlWebster.com >>>> >>>> >>>> >>>> ________________________________ >>>> >>>> From: [email protected] >>>> <[email protected]> on behalf of Michael B. Smith >>>> <[email protected]> >>>> Sent: Thursday, August 6, 2015 3:15 PM >>>> To: [email protected] >>>> Subject: RE: [powershell] Re: Need some pointers on an exercise I've >>>> set for myself >>>> >>>> >>>> >>>> Don't use use. :-) >>>> >>>> Sent from my Windows Phone >>>> >>>> ________________________________ >>>> >>>> From: Kurt Buff >>>> Sent: 8/6/2015 1:09 PM >>>> To: [email protected] >>>> Subject: Re: [powershell] Re: Need some pointers on an exercise I've >>>> set for myself >>>> >>>> Sorry, yes, when I said I ran it manually, I meant that I ran it >>>> from the normal shell, not from the ISE. >>>> >>>> Kurt >>>> >>>> On Thu, Aug 6, 2015 at 12:50 PM, Michael B. Smith >>>> <[email protected]> >>>> wrote: >>>>> Do you get different behavior running it from the normal shell? >>>>> >>>>> -----Original Message----- >>>>> From: [email protected] >>>>> [mailto:[email protected]] On Behalf Of Kurt Buff >>>>> Sent: Thursday, August 6, 2015 2:20 PM >>>>> To: [email protected] >>>>> Subject: [powershell] Re: Need some pointers on an exercise I've >>>>> set for myself >>>>> >>>>> Getting much closer... >>>>> >>>>> When running this line of code: >>>>> >>>>> Import-CSV C:\temp\IT-files.csv | Select-Object -Property >>>>> @{Name="Hash";Expression={(get-filehash -algorithm md5 -literalPath >>>>> $_.FullName).Hash}},Length,FullName | export-csv -NoTypeInformation >>>>> c:\temp\IT-filehash.csv >>>>> >>>>> I get 18 files that don't get a hash (out of 22,727 files, so I'm >>>>> not hugely fussed about it). So, out of curiosity, I ran >>>>> get-filehash against them manually, that is, not as an entry in a CSV >>>>> file. >>>>> >>>>> For one of them, I've identified why - someone has it open for >>>>> writing, which once I think about it is not unexpected >>>>> >>>>> But, I'm not seeing error output in the ISE for that file, and for >>>>> the rest, which is a bit strange, and for the files that aren't >>>>> opened, and I manually do a get-filehash against them, I get a hash just >>>>> fine. >>>>> >>>>> So, for grins, I ran it again from the ISE, against a CSV file >>>>> containing only the headers and the list of files that didn't hash >>>>> originally, I >>>>> *still* don't get a hash, or an error code for the file that's open >>>>> for write. The files that don't get a hash are just PDF and DOC files. >>>>> >>>>> Anyone run into anything like this? >>>>> >>>>> On Mon, Aug 3, 2015 at 5:25 PM, Kurt Buff <[email protected]> wrote: >>>>>> Replying to myself, since that seems the reasonable thing to do here. >>>>>> >>>>>> I've tested the following against a smaller directory that I know >>>>>> has some duplicates, and am getting progress. Here is what I have >>>>>> so far (work with the line wraps!): >>>>>> >>>>>> Get-ChildItem S:\ -File -Recurse | select fullname, length | >>>>>> Export-CSV -NoTypeInformation c:\temp\files.csv >>>>>> >>>>>> Import-CSV c:\temp\files.csv | Select-Object -Property >>>>>> @{Name="MD5";Expression={(Get-Filehash -algorithm md5 >>>>>> $_.FullName).MD5}},Length,FullName | Export-CSV -NoTypeInformation >>>>>> c:\temp\filehash.csv >>>>>> >>>>>> Import-CSV C:\temp\checker\fileMD5.csv | Sort-Object >>>>>> @{Expression={$_.Length -as [int]}} | Export-CSV >>>>>> -NoTypeInformation c:\temp\checker\FileMD5Sorted.csv >>>>>> >>>>>> The above generates a file of 315286 lines (not including header) >>>>>> - of course, that's the number of files in the directory tree. I >>>>>> get output that looks like this (work with the line wraps again): >>>>>> >>>>>> "MD5","Length","FullName" >>>>>> >>>>>> "6467C3875955DF4514395F0AFCAAA62A","3182604288","S:\Infrastructure\Microsoft\OSes\Win7EntSP1_64bit\SW_DVD5_SA_Win_Ent_7w_SP1_64BIT_English_-2_MLF_X17-58882.ISO" >>>>>> >>>>>> I noticed two oddities, however: >>>>>> >>>>>> o- zero-length files generate a hash, and of course the hash is >>>>>> the same for all of them. I probably should have expected that, >>>>>> but it surprised me. >>>>>> >>>>>> o- I find a handful of files (22 of them) at the top of the csv >>>>>> file after sorting that don't seem to obey the sorting on the hash >>>>>> that the other files followed. It's very strange. They're not >>>>>> duplicates of any other files; their hashes and file sizes are out >>>>>> of sort order from all of the rest, AFAICT. I'm not sure what to make of >>>>>> that. >>>>>> >>>>>> But, ignoring those two things, I'd like to proceed a bit further: >>>>>> >>>>>> o- Writing to another file only those lines that are duplicate >>>>>> files, which I can do by selecting selecting the lines that have >>>>>> matching hashes (and possibly also matching sizes) >>>>>> >>>>>> o- Possibly adding another column, which would contain an integer >>>>>> that would increment for each set of matched files, which would >>>>>> probably lead to... >>>>>> >>>>>> o- Among other things, calculating the amount of duplicated space >>>>>> (sum of n-1 file sizes for each set of dupes), identifying >>>>>> duplicate directories that can be eliminated in toto, etc. >>>>>> >>>>>> But, I'm stymied on the execution of the logic. I'm such an >>>>>> inexperienced programmer that I'm flailing on the first of these >>>>>> steps. I believe I need to make a stepwise comparison of the MD5 >>>>>> column, which I think would look something like this: >>>>>> >>>>>> $dupe = 1 >>>>>> read infile.line1 into variable1 >>>>>> read infile.line2 into variable2 >>>>>> if { >>>>>> variable1.MD5 -eq variable2.MD5 >>>>>> prefix variable1 with dupe counter >>>>>> write variable1 to the new csv file >>>>>> while not eof >>>>>> set variable1 to the contents of variable2 >>>>>> read line next into variable2 >>>>>> compare variable1.MD5 to variable2.MD5 >>>>>> if match >>>>>> prefix variable1 with $dupe >>>>>> append variable1 as new line of new csv file >>>>>> else >>>>>> increment dupe counter >>>>>> endwhile } >>>>>> else { >>>>>> while not eof >>>>>> set variable1 to the contents of variable2 >>>>>> read line next into variable2 >>>>>> compare variable1.MD5 to variable2.MD5 >>>>>> if match >>>>>> prefix variable1 with $dupe >>>>>> append variable1 as new line of new csv file >>>>>> else >>>>>> increment dupe counter >>>>>> endwhile >>>>>> >>>>>> I realize I could be way off base on the algorithm here, but >>>>>> that's what I've been able to dream up. >>>>>> >>>>>> Anyone care to critique and offer syntax suggestions - my googlefu >>>>>> is about exhausted. >>>>>> >>>>>> Kurt >>>>>> >>>>>> On Thu, Jul 30, 2015 at 12:45 PM, Kurt Buff <[email protected]> wrote: >>>>>>> I'm putting together what should be a simple little script, and failing. >>>>>>> >>>>>>> I am ultimately looking to run this against a directory, then >>>>>>> sort the output on the hash field and then parse for duplicates. >>>>>>> There are two conditions that concern me: 1) there are over 3m >>>>>>> files in the target directory, and 2) many of the files are quite >>>>>>> large, over 1g. >>>>>>> >>>>>>> I'm more concerned about the effects of the script on memory than >>>>>>> on processor - the data is fairly static, and I intend to run it >>>>>>> once a month or even less, but I did choose MD5 as the hash >>>>>>> algorithm for speed, rather than accept the default of SHA256. >>>>>>> >>>>>>> This is pretty simple stuff, I'm sure, but I'm using this as a >>>>>>> learning exercise more than anything, as there are duplicate file >>>>>>> finders out in the world already. >>>>>>> >>>>>>> There are several problems with what I have put together so far, >>>>>>> which this this: >>>>>>> >>>>>>> Get-ChildItem c:\stuff -Recurse | select length, fullname | >>>>>>> export-csv -NoTypeInformation c:\temp\files.csv >>>>>>> Import-CSV C:\temp\files.csv | ForEach-Object { >>>>>>> (get-filehash -algorithm md5 $_.FullName) }; Length | Sort hash >>>>>>> >>>>>>> Using Length (or $_.Length) anywhere in the foreach statement >>>>>>> gives an error, or gives weird output. >>>>>>> >>>>>>> Sample Output when not using Length, and therefore getting >>>>>>> reasonable output (extra spaces and hyphen delimiters elided): >>>>>>> Algorithm Hash >>>>>>> Path >>>>>>> MD5 592BE1AD0ED83C36D5E68CA7A014A510 >>>>>>> C:\stuff\Tools\SomeFile.DOC >>>>>>> >>>>>>> What I'd like to see instead >>>>>>> Hash >>>>>>> Length Path >>>>>>> 592BE1AD0ED83C36D5E68CA7A014A510 79872 >>>>>>> C:\stuff\Tools\SomeFile.DOC >>>>>>> >>>>>>> If anyone can offer some instruction, I'd appreciate it. >>>>>>> >>>>>>> Kurt >>>>> >>>>> >>>>> ================================================ >>>>> Did you know you can also post and find answers on PowerShell in >>>>> the forums? >>>>> http://www.myitforum.com/forums/default.asp?catApp=1 >>>>> >>>>> >>>>> ================================================ >>>>> Did you know you can also post and find answers on PowerShell in >>>>> the forums? >>>>> http://www.myitforum.com/forums/default.asp?catApp=1 >>>> >>>> >>>> ================================================ >>>> Did you know you can also post and find answers on PowerShell in the >>>> forums? >>>> http://www.myitforum.com/forums/default.asp?catApp=1 >>>> >>>> >>>> ================================================ >>>> Did you know you can also post and find answers on PowerShell in the >>>> forums? >>>> http://www.myitforum.com/forums/default.asp?catApp=1 >>>> >>>> >>>> ================================================ >>>> Did you know you can also post and find answers on PowerShell in the >>>> forums? >>>> http://www.myitforum.com/forums/default.asp?catApp=1 >>>> >>>> >>>> ================================================ >>>> Did you know you can also post and find answers on PowerShell in the >>>> forums? >>>> http://www.myitforum.com/forums/default.asp?catApp=1 >>> >>> >>> ================================================ >>> Did you know you can also post and find answers on PowerShell in the forums? >>> http://www.myitforum.com/forums/default.asp?catApp=1 >>> >>> >>> ================================================ >>> Did you know you can also post and find answers on PowerShell in the forums? >>> http://www.myitforum.com/forums/default.asp?catApp=1 >> >> >> ================================================ >> Did you know you can also post and find answers on PowerShell in the forums? >> http://www.myitforum.com/forums/default.asp?catApp=1 >> > > > ================================================ > Did you know you can also post and find answers on PowerShell in the forums? > http://www.myitforum.com/forums/default.asp?catApp=1 > > > ================================================ > Did you know you can also post and find answers on PowerShell in the forums? > http://www.myitforum.com/forums/default.asp?catApp=1 ================================================ Did you know you can also post and find answers on PowerShell in the forums? http://www.myitforum.com/forums/default.asp?catApp=1
