No, I is in Birmingham, AL and U are in cville, VA. Hardly next to each other. :)
Webster From: <[email protected]<mailto:[email protected]>> on behalf of Michael Smith Reply-To: "[email protected]<mailto:[email protected]>" Date: Thursday, August 6, 2015 at 4:19 PM To: "[email protected]<mailto:[email protected]>" Subject: RE: [powershell] Re: Need some pointers on an exercise I've set for myself I and U are beside each other. ☺ Don’t use ISE. From: [email protected]<mailto:[email protected]> [mailto:[email protected]] On Behalf Of Webster Sent: Thursday, August 6, 2015 4:17 PM To: [email protected]<mailto:[email protected]> Subject: Re: [powershell] Re: Need some pointers on an exercise I've set for myself Don't use use? Carl Webster Consultant and Citrix Technology Professional http://www.CarlWebster.com<http://www.carlwebster.com/> ________________________________ From:[email protected]<mailto:[email protected]> <[email protected]<mailto:[email protected]>> on behalf of Michael B. Smith <[email protected]<mailto:[email protected]>> Sent: Thursday, August 6, 2015 3:15 PM To: [email protected]<mailto:[email protected]> Subject: RE: [powershell] Re: Need some pointers on an exercise I've set for myself Don't use use. :-) Sent from my Windows Phone ________________________________ From: Kurt Buff<mailto:[email protected]> Sent: 8/6/2015 1:09 PM To: [email protected]<mailto:[email protected]> Subject: Re: [powershell] Re: Need some pointers on an exercise I've set for myself Sorry, yes, when I said I ran it manually, I meant that I ran it from the normal shell, not from the ISE. Kurt On Thu, Aug 6, 2015 at 12:50 PM, Michael B. Smith <[email protected]<mailto:[email protected]>> wrote: > Do you get different behavior running it from the normal shell? > > -----Original Message----- > From: [email protected]<mailto:[email protected]> > [mailto:[email protected]] On Behalf Of Kurt Buff > Sent: Thursday, August 6, 2015 2:20 PM > To: [email protected]<mailto:[email protected]> > Subject: [powershell] Re: Need some pointers on an exercise I've set for > myself > > Getting much closer... > > When running this line of code: > > Import-CSV C:\temp\IT-files.csv | Select-Object -Property > @{Name="Hash";Expression={(get-filehash -algorithm md5 -literalPath > $_.FullName).Hash}},Length,FullName | export-csv -NoTypeInformation > c:\temp\IT-filehash.csv > > I get 18 files that don't get a hash (out of 22,727 files, so I'm not hugely > fussed about it). So, out of curiosity, I ran get-filehash against them > manually, that is, not as an entry in a CSV file. > > For one of them, I've identified why - someone has it open for writing, which > once I think about it is not unexpected > > But, I'm not seeing error output in the ISE for that file, and for the rest, > which is a bit strange, and for the files that aren't opened, and I manually > do a get-filehash against them, I get a hash just fine. > > So, for grins, I ran it again from the ISE, against a CSV file containing > only the headers and the list of files that didn't hash originally, I *still* > don't get a hash, or an error code for the file that's open for write. The > files that don't get a hash are just PDF and DOC files. > > Anyone run into anything like this? > > On Mon, Aug 3, 2015 at 5:25 PM, Kurt Buff > <[email protected]<mailto:[email protected]>> wrote: >> Replying to myself, since that seems the reasonable thing to do here. >> >> I've tested the following against a smaller directory that I know has >> some duplicates, and am getting progress. Here is what I have so far >> (work with the line wraps!): >> >> Get-ChildItem S:\ -File -Recurse | select fullname, length | >> Export-CSV -NoTypeInformation c:\temp\files.csv >> >> Import-CSV c:\temp\files.csv | Select-Object -Property >> @{Name="MD5";Expression={(Get-Filehash -algorithm md5 >> $_.FullName).MD5}},Length,FullName | Export-CSV -NoTypeInformation >> c:\temp\filehash.csv >> >> Import-CSV C:\temp\checker\fileMD5.csv | Sort-Object >> @{Expression={$_.Length -as [int]}} | Export-CSV -NoTypeInformation >> c:\temp\checker\FileMD5Sorted.csv >> >> The above generates a file of 315286 lines (not including header) - of >> course, that's the number of files in the directory tree. I get output >> that looks like this (work with the line wraps again): >> >> "MD5","Length","FullName" >> "6467C3875955DF4514395F0AFCAAA62A","3182604288","S:\Infrastructure\Microsoft\OSes\Win7EntSP1_64bit\SW_DVD5_SA_Win_Ent_7w_SP1_64BIT_English_-2_MLF_X17-58882.ISO" >> >> I noticed two oddities, however: >> >> o- zero-length files generate a hash, and of course the hash is the >> same for all of them. I probably should have expected that, but it >> surprised me. >> >> o- I find a handful of files (22 of them) at the top of the csv file >> after sorting that don't seem to obey the sorting on the hash that the >> other files followed. It's very strange. They're not duplicates of any >> other files; their hashes and file sizes are out of sort order from >> all of the rest, AFAICT. I'm not sure what to make of that. >> >> But, ignoring those two things, I'd like to proceed a bit further: >> >> o- Writing to another file only those lines that are duplicate files, >> which I can do by selecting selecting the lines that have matching >> hashes (and possibly also matching sizes) >> >> o- Possibly adding another column, which would contain an integer that >> would increment for each set of matched files, which would probably >> lead to... >> >> o- Among other things, calculating the amount of duplicated space (sum >> of n-1 file sizes for each set of dupes), identifying duplicate >> directories that can be eliminated in toto, etc. >> >> But, I'm stymied on the execution of the logic. I'm such an >> inexperienced programmer that I'm flailing on the first of these >> steps. I believe I need to make a stepwise comparison of the MD5 >> column, which I think would look something like this: >> >> $dupe = 1 >> read infile.line1 into variable1 >> read infile.line2 into variable2 >> if { >> variable1.MD5 -eq variable2.MD5 >> prefix variable1 with dupe counter >> write variable1 to the new csv file >> while not eof >> set variable1 to the contents of variable2 >> read line next into variable2 >> compare variable1.MD5 to variable2.MD5 >> if match >> prefix variable1 with $dupe >> append variable1 as new line of new csv file >> else >> increment dupe counter >> endwhile } >> else { >> while not eof >> set variable1 to the contents of variable2 >> read line next into variable2 >> compare variable1.MD5 to variable2.MD5 >> if match >> prefix variable1 with $dupe >> append variable1 as new line of new csv file >> else >> increment dupe counter >> endwhile >> >> I realize I could be way off base on the algorithm here, but that's >> what I've been able to dream up. >> >> Anyone care to critique and offer syntax suggestions - my googlefu is >> about exhausted. >> >> Kurt >> >> On Thu, Jul 30, 2015 at 12:45 PM, Kurt Buff >> <[email protected]<mailto:[email protected]>> wrote: >>> I'm putting together what should be a simple little script, and failing. >>> >>> I am ultimately looking to run this against a directory, then sort >>> the output on the hash field and then parse for duplicates. There are >>> two conditions that concern me: 1) there are over 3m files in the >>> target directory, and 2) many of the files are quite large, over 1g. >>> >>> I'm more concerned about the effects of the script on memory than on >>> processor - the data is fairly static, and I intend to run it once a >>> month or even less, but I did choose MD5 as the hash algorithm for >>> speed, rather than accept the default of SHA256. >>> >>> This is pretty simple stuff, I'm sure, but I'm using this as a >>> learning exercise more than anything, as there are duplicate file >>> finders out in the world already. >>> >>> There are several problems with what I have put together so far, >>> which this this: >>> >>> Get-ChildItem c:\stuff -Recurse | select length, fullname | >>> export-csv -NoTypeInformation c:\temp\files.csv >>> Import-CSV C:\temp\files.csv | ForEach-Object { (get-filehash >>> -algorithm md5 $_.FullName) }; Length | Sort hash >>> >>> Using Length (or $_.Length) anywhere in the foreach statement gives >>> an error, or gives weird output. >>> >>> Sample Output when not using Length, and therefore getting reasonable >>> output (extra spaces and hyphen delimiters elided): >>> Algorithm Hash >>> Path >>> MD5 592BE1AD0ED83C36D5E68CA7A014A510 >>> C:\stuff\Tools\SomeFile.DOC >>> >>> What I'd like to see instead >>> Hash Length >>> Path >>> 592BE1AD0ED83C36D5E68CA7A014A510 79872 >>> C:\stuff\Tools\SomeFile.DOC >>> >>> If anyone can offer some instruction, I'd appreciate it. >>> >>> Kurt > > > ================================================ > Did you know you can also post and find answers on PowerShell in the forums? > http://www.myitforum.com/forums/default.asp?catApp=1 > > > ================================================ > Did you know you can also post and find answers on PowerShell in the forums? > http://www.myitforum.com/forums/default.asp?catApp=1 ================================================ Did you know you can also post and find answers on PowerShell in the forums? http://www.myitforum.com/forums/default.asp?catApp=1 ================================================ Did you know you can also post and find answers on PowerShell in the forums? http://www.myitforum.com/forums/default.asp?catApp=1 ================================================ Did you know you can also post and find answers on PowerShell in the forums? http://www.myitforum.com/forums/default.asp?catApp=1 ================================================ Did you know you can also post and find answers on PowerShell in the forums? http://www.myitforum.com/forums/default.asp?catApp=1 ================================================ Did you know you can also post and find answers on PowerShell in the forums? http://www.myitforum.com/forums/default.asp?catApp=1
