Summary:
I would like a perl script that converts the output of the Windows dir
command so that each line has the same format, including the directory
it is in, and its extension.  The date and time should use a format that
can be sorted as a string, e.g. yyyy-mm-dd and a 24 hour clock
I think pipe delimited would work best, as the pipe character | cannot
appear in a file name, and that would let me sort the output, and/or
load it into a database.

Details:
I could probably write this in an hour but laziness is a virtue, and if
someone has got one already that will probably be better anyway.
I want to translate lines like this:

 Directory of C:\_from_laptop\AAA BBB_files

04/14/2003  10:21 AM               123 abc
04/14/2003  11:00 PM                 0 empty.jpg.txt

To lines something like this.  Note that I moved the file name and
extension sooner, so that the natural sort is by directory and file
name, and a sort on the last two fields is by time.  (I have a port of
Unix sort in my c:\bin\ directory that I can use.)

C:\_from_laptop\AAA BBB_files|abc||File|123|2003-04-14|10:21
C:\_from_laptop\AAA BBB_files|empty.jpg|txt|Dir|0|2003-04-14|23:00


None of it is tricky.  You just need to remember what Directory line you
saw last, convert the date and time fields, insert either File or Dir
depending on its type, and write out each line that comes from a file or
dir (except skip all the . and .. dirs).  Note that a file named
foo.bar.txt has a name of foo.bar and extension of txt.  Some files can
have no extension, and some directories do have an extension.

Here is an except of the output.  (Because it is an except the totals
for Files and Bytes are not right.)
Note that there are a few lines of boilerplate at the beginning which
can be ignored, and a few lines at the end which can be ignored (or used
as a sanity check on the totals.)  Note that a file might not have an
extension, that a file or directory can be empty, can have white space
and strange characters in its name.

 Volume in drive C has no label.
 Volume Serial Number is A898-B50D

 Directory of C:\_from_laptop

01/23/2005  08:37 AM    <DIR>          .
01/23/2005  08:37 AM    <DIR>          ..
04/14/2003  01:46 PM    <DIR>          _from_c
02/06/2001  01:34 PM             15618 0101.txt
02/06/2001  01:34 PM             15618 abc
04/14/2003  10:22 AM             32451 AAA BBB.htm
01/17/2005  09:53 AM    <DIR>          AAA BBB_files
04/04/2000  06:14 PM             27648 acm_pubform.doc
01/17/2005  09:53 AM    <DIR>          acrobat
01/17/2005  09:54 AM    <DIR>          address
08/17/2004  10:04 AM                 0 zzz
             650 File(s)       92010877 bytes

 Directory of C:\_from_laptop\AAA BBB_files

01/17/2005  09:53 AM    <DIR>          .
01/17/2005  09:53 AM    <DIR>          ..
04/14/2003  10:21 AM              1045 abc
04/14/2003  10:21 AM                  0 empty.jpg.txt
04/14/2003  10:22 AM             32451 AAA BBB CCC.htm
01/17/2005  09:53 AM    <DIR>          AAA BBB_CCC_files
04/14/2003  10:21 AM                43 spacer.gif
              11 File(s)          37476 bytes

 Directory of C:\_from_laptop\AAA BBB CCC_files

01/17/2005  09:53 AM    <DIR>          .
01/17/2005  09:53 AM    <DIR>          ..
               0 File(s)              0 bytes

     Total Files Listed:
           245909 File(s)    28969650933 bytes
           154376 Dir(s)     31272304640 bytes free


Background:
My laptop's died a few days ago.  The process to recover files and
directories from it seems to have lots of missing files.  I have a
directory on another machine that I have been backing up to.  I want to
find out which file are missing.  I have run dir on the backed up
machine, and will run dir on the new machine, and then diff the outputs.
The diff will work best if each line in the file had the same format,
and includes the full directory path.

P.S. Here is the command I ran in a DOS box (aka command prompt window
etc.) from my Windows XP machine.

dir >dir.txt c:\_from_laptop /-C /ON /S /TW /4

The /-C means suppress the thousand separator in the size, /ON means
order by name, /S means recurse into subdirectories, /TW means show the
last time it was written, and /4 means show 4 digit years.



Thanks,
Steve
 
_______________________________________________
Boston-pm mailing list
[email protected]
http://mail.pm.org/mailman/listinfo/boston-pm

Reply via email to