I've got a rather baffling situation involving a daily file update process.  
I'm thinking it may be related to content indexing but I'm not sure that it is 
or if so how I'd go about resolving it.

We run a process every night which goes out to a web site and downloads all 
updated files from a folder.  This usually consists of between 700-1000 zip 
files.  The contents are then extracted to refresh a local copy of the data.  
As part of those zip files, there is an Access mdb file which gets modified 
whenever any of the other files in the zip file are changed.  Because the 
provider of the original data ends up touching some of the zip files even 
though they may not have actually changed (don't ask, I have no clue why) it 
turns out that only about 20% of those zip files actually have changes and need 
further processing.  That's where the weird part comes in.

The overall data collection consists of about 1.5M files in 1600 folders.  
That's one folder for each of the possible zip files from the original data.  
Therefore, one matching MDB file.  I've scripted a check for updated MDB files 
using a couple of methods.  When I test them manually they take about 10 
minutes to complete.  During our nightly processing that same thing takes about 
2.5-3.0 hours.  I initially thought it was I/O contention with backups, etc., 
but that doesn't appear to be the case.  My initial method for the check was 
using the replace command to copy the updated files to a target folder. Since 
that has to be done as part of the process anyway it should be a zero cost 
choice.  When it was taking upwards of 3 hours I tested the process using 
forfiles instead. Hey, 3 hours down to 10 minutes... WHOHOO!  But when I did a 
sanity check by putting both the forfiles command and the replace command into 
the nightly processing script, the forfiles piece now takes up to 3 hours and 
the replace command takes... wait for it... yep, 10 minutes.  This is dependent 
on order of course. If I swap them the times swap as well.

As I said, my guess is this has to do with indexing on the volume, but if 
that's the case, how do I confirm that fact, and more importantly, is there 
anything I can do to fix it?

--------------------
Melvin Backus | Sr. Systems Engineer | Byers Engineering Company | 404.497.1565
Service Desk | 404-497-1599 | http://servicedesk.byers.com
--
There are 10 kinds of people in the world...
         those who understand binary and those who don't.


Reply via email to