Hi, We have an open audit issue regarding the files that are pulled from external interfaces. We download these files using wget utility. wget commands are being called from Pro*C batches e.g. for reference, code is something like << sprintf (WGET, "%s%s%s/%s.%s", "wget -P ",FEEDFILE_PATH," ftp://username:passw...@host", FileName, "Z");>>
Now, the audit issue is to ensure the data integrity and data completeness for the file that has been downloaded using wget. Option 1-> Recommended option is ofcourse checksum approach, in which we can get the checksum (any checksum e.g. MD5, SH1)of the file on remote server. After that, we can get the checksum of file on local server(just downloaded using wget). Then we can compare checksum to ensure the file has been successfully(and completely) downloaded. I checked on google/wget manual. wget does not provide any option to get the checksum but there were functions like gnu_md5.c, don't know why these are used.. Option 2 -> is to check the File size on remote FTP server. After retrieving the file (using wget), our application can compare this file size with the file size of retrieved file. If file size does not match, error will be raised. Now wget does not provide any direct option for getting the file size. But it gives that information in the output message e.g. ************************************************************************* --2009-07-28 09:52:41-- ftp://.... Resolving http-proxy.gslb.db.com... 10.233.152.36 Connecting to http-proxy.gslb.db.com|10.233.152.36|:8080... connected. Proxy request sent, awaiting response... 200 OK Length: 22774 (22K) Saving to: `C090725.eod' 100%[===================================================================================>] 22,774 --.-K/s in 0.004s Last-modified header missing -- time-stamps turned off. 2009-07-28 09:52:43 (5.09 MB/s) - `C090725.eod' saved [22774/22774] *************************************************************************** Issue is , I can not automate this. If I read this output message from my batch e.g. grep on file size OR 100%, then this is not something that will remain same in all the wget versions. This output text can change for new version of wget. Even with the same version, If I check different file on different server , output message is different. So, I do not want to rely on this information. ************************8 Connecting to 10.140.76.100:21... connected. Logging in as pardev ... Logged in! ==> SYST ... done. ==> PWD ... done. ==> TYPE I ... done. ==> CWD not needed. ==> SIZE CLO_090722.csv_22-07-2009 ... 5147277786087434 ==> PASV ... done. ==> RETR CLO_090722.csv_22-07-2009 ... done. Length: 5147277786087434 (4.6P) 0% [ ] 1,198,444 --.-K/s in 0.1s 2009-07-28 10:17:37 (8.89 MB/s) - `CLO_090722.csv_22-07-2009' saved [1198444] *************************** Option 3 ->I checked other options, and I found this option: When running Wget with -N, with or without -r, the decision as to whether or not to download a newer copy of a file depends on the local and remote timestamp and size of the file. So, we thought may be after downloading the file using wget, we can execute wget -N, and if this command gives the message that file is same. This will imply that (timestamp, size) on local is same as (timestamp, size) on remote server. But when I checked this option in my Production envt. I got this message: <<Proxy request sent, awaiting response... 400 Bad Request 2009-07-28 09:55:39 ERROR 400: Bad Request.>> This was working fine with a sample file in test envt, Now, my requirement is very simple. To ensure the data completeness/integrity. Can somebody please suggest which options I should use or I can use?? My first preference is to compare checksum. Thanks & Regards, Anamika Jindal =====-----=====-----===== Notice: The information contained in this e-mail message and/or attachments to it may contain confidential or privileged information. If you are not the intended recipient, any dissemination, use, review, distribution, printing or copying of the information contained in this e-mail message and/or attachments to it are strictly prohibited. If you have received this communication in error, please notify us by reply e-mail or telephone and immediately and permanently delete the message and any attachments. Thank you