Summary:
I would like a perl script that converts the output of the Windows dir
command so that each line has the same format, including the directory
it is in, and its extension. The date and time should use a format that
can be sorted as a string, e.g. yyyy-mm-dd and a 24 hour clock
I think pipe delimited would work best, as the pipe character | cannot
appear in a file name, and that would let me sort the output, and/or
load it into a database.
Details:
I could probably write this in an hour but laziness is a virtue, and if
someone has got one already that will probably be better anyway.
I want to translate lines like this:
Directory of C:\_from_laptop\AAA BBB_files
04/14/2003 10:21 AM 123 abc
04/14/2003 11:00 PM 0 empty.jpg.txt
To lines something like this. Note that I moved the file name and
extension sooner, so that the natural sort is by directory and file
name, and a sort on the last two fields is by time. (I have a port of
Unix sort in my c:\bin\ directory that I can use.)
C:\_from_laptop\AAA BBB_files|abc||File|123|2003-04-14|10:21
C:\_from_laptop\AAA BBB_files|empty.jpg|txt|Dir|0|2003-04-14|23:00
None of it is tricky. You just need to remember what Directory line you
saw last, convert the date and time fields, insert either File or Dir
depending on its type, and write out each line that comes from a file or
dir (except skip all the . and .. dirs). Note that a file named
foo.bar.txt has a name of foo.bar and extension of txt. Some files can
have no extension, and some directories do have an extension.
Here is an except of the output. (Because it is an except the totals
for Files and Bytes are not right.)
Note that there are a few lines of boilerplate at the beginning which
can be ignored, and a few lines at the end which can be ignored (or used
as a sanity check on the totals.) Note that a file might not have an
extension, that a file or directory can be empty, can have white space
and strange characters in its name.
Volume in drive C has no label.
Volume Serial Number is A898-B50D
Directory of C:\_from_laptop
01/23/2005 08:37 AM <DIR> .
01/23/2005 08:37 AM <DIR> ..
04/14/2003 01:46 PM <DIR> _from_c
02/06/2001 01:34 PM 15618 0101.txt
02/06/2001 01:34 PM 15618 abc
04/14/2003 10:22 AM 32451 AAA BBB.htm
01/17/2005 09:53 AM <DIR> AAA BBB_files
04/04/2000 06:14 PM 27648 acm_pubform.doc
01/17/2005 09:53 AM <DIR> acrobat
01/17/2005 09:54 AM <DIR> address
08/17/2004 10:04 AM 0 zzz
650 File(s) 92010877 bytes
Directory of C:\_from_laptop\AAA BBB_files
01/17/2005 09:53 AM <DIR> .
01/17/2005 09:53 AM <DIR> ..
04/14/2003 10:21 AM 1045 abc
04/14/2003 10:21 AM 0 empty.jpg.txt
04/14/2003 10:22 AM 32451 AAA BBB CCC.htm
01/17/2005 09:53 AM <DIR> AAA BBB_CCC_files
04/14/2003 10:21 AM 43 spacer.gif
11 File(s) 37476 bytes
Directory of C:\_from_laptop\AAA BBB CCC_files
01/17/2005 09:53 AM <DIR> .
01/17/2005 09:53 AM <DIR> ..
0 File(s) 0 bytes
Total Files Listed:
245909 File(s) 28969650933 bytes
154376 Dir(s) 31272304640 bytes free
Background:
My laptop's died a few days ago. The process to recover files and
directories from it seems to have lots of missing files. I have a
directory on another machine that I have been backing up to. I want to
find out which file are missing. I have run dir on the backed up
machine, and will run dir on the new machine, and then diff the outputs.
The diff will work best if each line in the file had the same format,
and includes the full directory path.
P.S. Here is the command I ran in a DOS box (aka command prompt window
etc.) from my Windows XP machine.
dir >dir.txt c:\_from_laptop /-C /ON /S /TW /4
The /-C means suppress the thousand separator in the size, /ON means
order by name, /S means recurse into subdirectories, /TW means show the
last time it was written, and /4 means show 4 digit years.
Thanks,
Steve
_______________________________________________
Boston-pm mailing list
[email protected]
http://mail.pm.org/mailman/listinfo/boston-pm