Out-freaking-standing.

Now I can move on to the next task.

Thank you again.

I'll be back...

Kurt

On Thu, Aug 6, 2015 at 5:55 PM, Michael B. Smith <[email protected]> wrote:
> Add "-Encoding Unicode" to both Export-Csv and Import-Csv each time you use 
> them.
>
> -----Original Message-----
> From: [email protected] [mailto:[email protected]] 
> On Behalf Of Kurt Buff
> Sent: Thursday, August 6, 2015 8:42 PM
> To: [email protected]
> Subject: Re: [powershell] Re: Need some pointers on an exercise I've set for 
> myself
>
> Well, nuts.
>
> That didn't fix it.
>
> Oh, wait!
>
> I found something...
>
> The files that aren't getting hashed have question marks in the 
> file/directory name somewhere - but not always in the same place, it's mixed 
> between the file name and the directory name(s) - sometimes one, and 
> sometimes the other. That question mark is a translation of an e with an 
> accent grave.
>
> Anyone have thoughts on getting around that?
>
> Kurt
>
> On Thu, Aug 6, 2015 at 5:11 PM, Kurt Buff <[email protected]> wrote:
>> OK - with that hint, I've solved that problem. Script has been updated
>> to prompt for the directory with read-host and set a variable.
>>
>> We'll see if that fixes the problem with missing hashes
>>
>> Kurt
>>
>> On Thu, Aug 6, 2015 at 3:16 PM, Michael B. Smith <[email protected]> 
>> wrote:
>>> You don't use $_ you use $input.
>>>
>>> -----Original Message-----
>>> From: [email protected]
>>> [mailto:[email protected]] On Behalf Of Kurt Buff
>>> Sent: Thursday, August 6, 2015 6:07 PM
>>> To: [email protected]
>>> Subject: Re: [powershell] Re: Need some pointers on an exercise I've
>>> set for myself
>>>
>>> Well, fine then! Don't execute scripts from the ISE... :)
>>>
>>> But, I've saved it to a .ps1 file, and am trying to run it in the
>>> regular shell, and am not seeing my expected results.
>>>
>>> Named DupeFileFinder.ps1, I execute it like so:
>>>
>>> c:\Batchfiles>"G:\Groups\Information Technology" |
>>> .\DupeFileFinder.ps1 or c:\Batchfiles>"G:\Groups\Information
>>> Technology" |
>>> c:\batchfiles\dupfilefinder.ps1
>>>
>>> and get output regarding the files in c:\batchfiles, not about
>>> "g:\groups\information technology"
>>>
>>> The script currently looks like this (and as I get this polished up,
>>> I'll configure it to accept directories as a parameter - haven't
>>> gotten that far yet):
>>>
>>> ----------Begin DupeFileFinder.csv---------- # Generate file.csv
>>> Get-ChildItem $_ -File -Recurse | select length, fullname |
>>> export-csv -NoTypeInformation c:\temp\fileList.csv
>>>
>>> # Generate filesWithHash.csv
>>> Import-CSV C:\temp\fileList.csv | Select-Object -Property
>>> @{Name="Hash";Expression={(get-filehash -algorithm md5 -literalPath
>>> $_.FullName).Hash}},Length,FullName | export-csv -NoTypeInformation
>>> c:\temp\fileListHashed.csv
>>>
>>> # Sort files ascending by Hash
>>> Import-CSV C:\temp\fileListHashed.csv | Sort-Object Hash | export-csv
>>> -NoTypeInformation c:\temp\FileListHashedSortedOnHash.csv
>>>
>>> # Extract non-unique files from the list Import-Csv
>>> C:\temp\FileListHashedSortedOnHash.csv | Group-Object -property Hash
>>> | Where-Object { $_.count -gt 1 } | Select -Expand Group | Export-Csv
>>> -NoTypeInformation c:\temp\fileDupesWithHash.csv ----------End
>>> DupeFileFinder.csv----------
>>>
>>> On Thu, Aug 6, 2015 at 2:19 PM, Michael B. Smith <[email protected]> 
>>> wrote:
>>>> I and U are beside each other. J
>>>>
>>>>
>>>>
>>>> Don’t use ISE.
>>>>
>>>>
>>>>
>>>> From: [email protected]
>>>> [mailto:[email protected]]
>>>> On Behalf Of Webster
>>>> Sent: Thursday, August 6, 2015 4:17 PM
>>>>
>>>>
>>>> To: [email protected]
>>>> Subject: Re: [powershell] Re: Need some pointers on an exercise I've
>>>> set for myself
>>>>
>>>>
>>>>
>>>> Don't use use?
>>>>
>>>>
>>>>
>>>>
>>>>
>>>> Carl Webster
>>>>
>>>> Consultant and Citrix Technology Professional
>>>>
>>>> http://www.CarlWebster.com
>>>>
>>>>
>>>>
>>>> ________________________________
>>>>
>>>> From: [email protected]
>>>> <[email protected]> on behalf of Michael B. Smith
>>>> <[email protected]>
>>>> Sent: Thursday, August 6, 2015 3:15 PM
>>>> To: [email protected]
>>>> Subject: RE: [powershell] Re: Need some pointers on an exercise I've
>>>> set for myself
>>>>
>>>>
>>>>
>>>> Don't use use. :-)
>>>>
>>>> Sent from my Windows Phone
>>>>
>>>> ________________________________
>>>>
>>>> From: Kurt Buff
>>>> Sent: ‎8/‎6/‎2015 1:09 PM
>>>> To: [email protected]
>>>> Subject: Re: [powershell] Re: Need some pointers on an exercise I've
>>>> set for myself
>>>>
>>>> Sorry, yes, when I said I ran it manually, I meant that I ran it
>>>> from the normal shell, not from the ISE.
>>>>
>>>> Kurt
>>>>
>>>> On Thu, Aug 6, 2015 at 12:50 PM, Michael B. Smith
>>>> <[email protected]>
>>>> wrote:
>>>>> Do you get different behavior running it from the normal shell?
>>>>>
>>>>> -----Original Message-----
>>>>> From: [email protected]
>>>>> [mailto:[email protected]] On Behalf Of Kurt Buff
>>>>> Sent: Thursday, August 6, 2015 2:20 PM
>>>>> To: [email protected]
>>>>> Subject: [powershell] Re: Need some pointers on an exercise I've
>>>>> set for myself
>>>>>
>>>>> Getting much closer...
>>>>>
>>>>> When running this line of code:
>>>>>
>>>>> Import-CSV C:\temp\IT-files.csv | Select-Object -Property
>>>>> @{Name="Hash";Expression={(get-filehash -algorithm md5 -literalPath
>>>>> $_.FullName).Hash}},Length,FullName | export-csv -NoTypeInformation
>>>>> c:\temp\IT-filehash.csv
>>>>>
>>>>> I get 18 files that don't get a hash (out of 22,727 files, so I'm
>>>>> not hugely fussed about it). So, out of curiosity, I ran
>>>>> get-filehash against them manually, that is, not as an entry in a CSV 
>>>>> file.
>>>>>
>>>>> For one of them, I've identified why - someone has it open for
>>>>> writing, which once I think about it is not unexpected
>>>>>
>>>>> But, I'm not seeing error output in the ISE for that file, and for
>>>>> the rest, which is a bit strange, and for the files that aren't
>>>>> opened, and I manually do a get-filehash against them, I get a hash just 
>>>>> fine.
>>>>>
>>>>> So, for grins, I ran it again from the ISE, against a CSV file
>>>>> containing only the headers and the list of files that didn't hash
>>>>> originally, I
>>>>> *still* don't get a hash, or an error code for the file that's open
>>>>> for write. The files that don't get a hash are just PDF and DOC files.
>>>>>
>>>>> Anyone run into anything like this?
>>>>>
>>>>> On Mon, Aug 3, 2015 at 5:25 PM, Kurt Buff <[email protected]> wrote:
>>>>>> Replying to myself, since that seems the reasonable thing to do here.
>>>>>>
>>>>>> I've tested the following against a smaller directory that I know
>>>>>> has some duplicates, and am getting progress. Here is what I have
>>>>>> so far (work with the line wraps!):
>>>>>>
>>>>>> Get-ChildItem S:\ -File -Recurse | select fullname, length |
>>>>>> Export-CSV -NoTypeInformation c:\temp\files.csv
>>>>>>
>>>>>> Import-CSV c:\temp\files.csv | Select-Object -Property
>>>>>> @{Name="MD5";Expression={(Get-Filehash -algorithm md5
>>>>>> $_.FullName).MD5}},Length,FullName | Export-CSV -NoTypeInformation
>>>>>> c:\temp\filehash.csv
>>>>>>
>>>>>> Import-CSV C:\temp\checker\fileMD5.csv | Sort-Object
>>>>>> @{Expression={$_.Length -as [int]}} | Export-CSV
>>>>>> -NoTypeInformation c:\temp\checker\FileMD5Sorted.csv
>>>>>>
>>>>>> The above generates a file of 315286 lines (not including header)
>>>>>> - of course, that's the number of files in the directory tree. I
>>>>>> get output that looks like this (work with the line wraps again):
>>>>>>
>>>>>> "MD5","Length","FullName"
>>>>>>
>>>>>> "6467C3875955DF4514395F0AFCAAA62A","3182604288","S:\Infrastructure\Microsoft\OSes\Win7EntSP1_64bit\SW_DVD5_SA_Win_Ent_7w_SP1_64BIT_English_-2_MLF_X17-58882.ISO"
>>>>>>
>>>>>> I noticed two oddities, however:
>>>>>>
>>>>>> o- zero-length files generate a hash, and of course the hash is
>>>>>> the same for all of them. I probably should have expected that,
>>>>>> but it surprised me.
>>>>>>
>>>>>> o- I find a handful of files (22 of them) at the top of the csv
>>>>>> file after sorting that don't seem to obey the sorting on the hash
>>>>>> that the other files followed. It's very strange. They're not
>>>>>> duplicates of any other files; their hashes and file sizes are out
>>>>>> of sort order from all of the rest, AFAICT. I'm not sure what to make of 
>>>>>> that.
>>>>>>
>>>>>> But, ignoring those two things, I'd like to proceed a bit further:
>>>>>>
>>>>>> o- Writing to another file only those lines that are duplicate
>>>>>> files, which I can do by selecting selecting the lines that have
>>>>>> matching hashes (and possibly also matching sizes)
>>>>>>
>>>>>> o- Possibly adding another column, which would contain an integer
>>>>>> that would increment for each set of matched files, which would
>>>>>> probably lead to...
>>>>>>
>>>>>> o- Among other things, calculating the amount of duplicated space
>>>>>> (sum of n-1 file sizes for each set of dupes), identifying
>>>>>> duplicate directories that can be eliminated in toto, etc.
>>>>>>
>>>>>> But, I'm stymied on the execution of the logic. I'm such an
>>>>>> inexperienced programmer that I'm flailing on the first of these
>>>>>> steps. I believe I need to make a stepwise comparison of the MD5
>>>>>> column, which I think would look something like this:
>>>>>>
>>>>>>      $dupe = 1
>>>>>>      read infile.line1 into variable1
>>>>>>      read infile.line2 into variable2
>>>>>>      if {
>>>>>>           variable1.MD5 -eq variable2.MD5
>>>>>>           prefix variable1 with dupe counter
>>>>>>           write variable1 to the new csv file
>>>>>>           while not eof
>>>>>>                set variable1 to the contents of variable2
>>>>>>                read line next into variable2
>>>>>>                compare variable1.MD5 to variable2.MD5
>>>>>>           if match
>>>>>>                prefix variable1 with $dupe
>>>>>>                append variable1 as new line of new csv file
>>>>>>           else
>>>>>>                increment dupe counter
>>>>>>      endwhile }
>>>>>>     else {
>>>>>>           while not eof
>>>>>>                set variable1 to the contents of variable2
>>>>>>                read line next into variable2
>>>>>>                compare variable1.MD5 to variable2.MD5
>>>>>>           if match
>>>>>>                prefix variable1 with $dupe
>>>>>>                append variable1 as new line of new csv file
>>>>>>           else
>>>>>>                increment dupe counter
>>>>>>      endwhile
>>>>>>
>>>>>> I realize I could be way off base on the algorithm here, but
>>>>>> that's what I've been able to dream up.
>>>>>>
>>>>>> Anyone care to critique and offer syntax suggestions - my googlefu
>>>>>> is about exhausted.
>>>>>>
>>>>>> Kurt
>>>>>>
>>>>>> On Thu, Jul 30, 2015 at 12:45 PM, Kurt Buff <[email protected]> wrote:
>>>>>>> I'm putting together what should be a simple little script, and failing.
>>>>>>>
>>>>>>> I am ultimately looking to run this against a directory, then
>>>>>>> sort the output on the hash field and then parse for duplicates.
>>>>>>> There are two conditions that concern me: 1) there are over 3m
>>>>>>> files in the target directory, and 2) many of the files are quite 
>>>>>>> large, over 1g.
>>>>>>>
>>>>>>> I'm more concerned about the effects of the script on memory than
>>>>>>> on processor - the data is fairly static, and I intend to run it
>>>>>>> once a month or even less, but I did choose MD5 as the hash
>>>>>>> algorithm for speed, rather than accept the default of SHA256.
>>>>>>>
>>>>>>> This is pretty simple stuff, I'm sure, but I'm using this as a
>>>>>>> learning exercise more than anything, as there are duplicate file
>>>>>>> finders out in the world already.
>>>>>>>
>>>>>>> There are several problems with what I have put together so far,
>>>>>>> which this this:
>>>>>>>
>>>>>>>      Get-ChildItem c:\stuff -Recurse | select length, fullname |
>>>>>>> export-csv -NoTypeInformation c:\temp\files.csv
>>>>>>>      Import-CSV C:\temp\files.csv | ForEach-Object {
>>>>>>> (get-filehash -algorithm md5 $_.FullName) }; Length | Sort hash
>>>>>>>
>>>>>>> Using Length (or $_.Length) anywhere in the foreach statement
>>>>>>> gives an error, or gives weird output.
>>>>>>>
>>>>>>> Sample Output when not using Length, and therefore getting
>>>>>>> reasonable output (extra spaces and hyphen delimiters elided):
>>>>>>>      Algorithm   Hash
>>>>>>>         Path
>>>>>>>      MD5          592BE1AD0ED83C36D5E68CA7A014A510
>>>>>>> C:\stuff\Tools\SomeFile.DOC
>>>>>>>
>>>>>>> What I'd like to see instead
>>>>>>>      Hash
>>>>>>> Length   Path
>>>>>>>      592BE1AD0ED83C36D5E68CA7A014A510    79872
>>>>>>> C:\stuff\Tools\SomeFile.DOC
>>>>>>>
>>>>>>> If anyone can offer some instruction, I'd appreciate it.
>>>>>>>
>>>>>>> Kurt
>>>>>
>>>>>
>>>>> ================================================
>>>>> Did you know you can also post and find answers on PowerShell in
>>>>> the forums?
>>>>> http://www.myitforum.com/forums/default.asp?catApp=1
>>>>>
>>>>>
>>>>> ================================================
>>>>> Did you know you can also post and find answers on PowerShell in
>>>>> the forums?
>>>>> http://www.myitforum.com/forums/default.asp?catApp=1
>>>>
>>>>
>>>> ================================================
>>>> Did you know you can also post and find answers on PowerShell in the 
>>>> forums?
>>>> http://www.myitforum.com/forums/default.asp?catApp=1
>>>>
>>>>
>>>> ================================================
>>>> Did you know you can also post and find answers on PowerShell in the 
>>>> forums?
>>>> http://www.myitforum.com/forums/default.asp?catApp=1
>>>>
>>>>
>>>> ================================================
>>>> Did you know you can also post and find answers on PowerShell in the 
>>>> forums?
>>>> http://www.myitforum.com/forums/default.asp?catApp=1
>>>>
>>>>
>>>> ================================================
>>>> Did you know you can also post and find answers on PowerShell in the 
>>>> forums?
>>>> http://www.myitforum.com/forums/default.asp?catApp=1
>>>
>>>
>>> ================================================
>>> Did you know you can also post and find answers on PowerShell in the forums?
>>> http://www.myitforum.com/forums/default.asp?catApp=1
>>>
>>>
>>> ================================================
>>> Did you know you can also post and find answers on PowerShell in the forums?
>>> http://www.myitforum.com/forums/default.asp?catApp=1
>>
>>
>> ================================================
>> Did you know you can also post and find answers on PowerShell in the forums?
>> http://www.myitforum.com/forums/default.asp?catApp=1
>>
>
>
> ================================================
> Did you know you can also post and find answers on PowerShell in the forums?
> http://www.myitforum.com/forums/default.asp?catApp=1
>
>
> ================================================
> Did you know you can also post and find answers on PowerShell in the forums?
> http://www.myitforum.com/forums/default.asp?catApp=1


================================================
Did you know you can also post and find answers on PowerShell in the forums?
http://www.myitforum.com/forums/default.asp?catApp=1

Reply via email to