On Aug 1, 3:40 pm, [EMAIL PROTECTED] (Luke) wrote: > I am new to perl. I am trying to analyze and insert do database some > data from bunch of weird unicode log files. I did manage to complete > Unicode and database part but now log analysis - thats the area where > I need some help... > > Log file has following structure: > > ********************************************************** > Process Report File > Start date & time: 6/4/2007 6:08:56 AM > > Process Settings: > ------------------------------ > => Setting 1 > => Setting 2 > => Setting 3 > => Setting 4 > > Report file: C:\logs\PSTMig_20070604060856.txt > > Files to be processed: > ------------------------------------------------------ > PST: G:\#HOU- Template Boston ATPW_TEMPLATEBOSTON.pst --> Archive: > JournHist01; Retention Category: AIM > PST: G:\#HOU- Template New York ATPW_TEMPLATENEWYORKATPW.pst --> > Archive: JournHist01; Retention Category: AIM > PST: G:\#PWM- Exchange Test Account_EXCHANGETESTACCOUNT.pst --> > Archive: JournHist01; Retention Category: AIM > > End time: 7:19:27 AM > Report file: C:\Program Files\Enterprise Vault\Reports > \PSTMig_20070604060856.txt > > Some of the PSTs contained items that were not eligible for archive. > These items have not been archived and remain in the PSTs. > See the Help for information on how to migrate these items. > > PSTs containing ineligible items: > > --------------------------------------------------------------------------------------------------- > PST: G:\#HOU- Template Boston ATPW_TEMPLATEBOSTON.pst --> Vault: > JournHist01 > PST: G:\#PWM- PF Admin_PWMPFADMIN.pst --> Vault: JournHist01 > PST: G:\698\ANTIGEN_BOSTON_ANTIGEN_BOSTON.pst --> Vault: > JournHist01 > > PSTs successfully processed: > -------------------------------------------- > PST: G:\Carpenter, Jeffrey_JEFFREYC.pst --> Vault: JournHist01 (1 > items archived) > > PST: G:\DBAGENT - Boston_DBAGENTA.pst --> Vault: JournHist01 (0 > items archived) > > PST: G:\Demo Account_DACCOUNT.pst --> Vault: JournHist01 (0 items > archived) > > PSTs partially processed: > --------------------------------------------------------------- > PST: G:\#HOU- Template Boston ATPW_TEMPLATEBOSTON.pst --> Vault: > JournHist01 > > Items archived: 0 (of 1) > Items failed archiving: 0 > Items not eligible for archiving: 1 > > PST: G:\#PWM- PF Admin_PWMPFADMIN.pst --> Vault: JournHist01 > > Items archived: 0 (of 41) > Items failed archiving: 0 > Items not eligible for archiving: 41 > > PST: G:\ANTIGEN_BOSTON_ANTIGEN_BOSTON.pst --> Vault: JournHist01 > > Items archived: 474 (of 614) > Items failed archiving: 0 > Items not eligible for archiving: 140 > > ********************************************************** > > File is divided into 5 'parts' - I need to extract list of files to be > processed and then search again for each file and find out what was a > migration status, basicly... > > My question is - how to parse such a file ? I am failing on whole > logic. > > Can anyone help me ? Any ideas are greatly appreciated...
use strict; use warnings; use Data::Dumper; # Looks like the file is small so simply treating it as a single string is # likely to be the simplest approach. # In reality you'd need to slurp the file in a Unicode-aware way use File::Slurp 'slurp'; local *_; $_ = slurp(\*DATA) or die $!; # Looks like every line that doesn't starts with 2 spaces # I'm assuming the continuation lines I saw in the OP were # artefacts s/^ //mg; # Looks like a header then multiple sections delimited by a tile followed by a colon # then a line with a lot of hyphens on it. my %sections = ('Header', split /\n(.*):\n-{10,}\n/); # OK, now we have a hash of file sections # The body of each section called PSTs... seems to have the same structure for ( @sections{ grep { /^PSTs /} keys %sections} ) { # Appears to be delimited by lines starting PST (but I want to # discard the bit before) the first such my( undef, %by_file) = split /^PST: (.*) -->/m; # Same trick again for ( values %by_file ) { my %file_info = /^\s+(.+?):\s+(.*)/mg; # Normalize how the number of items archived and total # are represented no warnings 'uninitialized'; if ( $file_info{Vault} =~ s/ \((\S+) items archived\)$// ) { $file_info{Archived} = $1; $file_info{Total} = $1; } if ( $file_info{'Items archived'} =~ /(\S+) \(of (\S+)\)/ ) { $file_info{Archived} = $1; $file_info{Total} = $2; } # Replace the value in %by_file with parsed structure $_ = \%file_info; } # Replace value in %sections with parsed structure $_ = \%by_file; } # The 'Files to be processed' section has another format for ( $sections{'Files to be processed'} ) { my %by_file; # Consider only lines starting PST: and extract file name and info while ( /^PST: (.*) -->(.*)/mg ) { my $pst = $1; # Parse the semicolon delimited list of tagged values into a hash $by_file{$pst} = { $2 =~ /\s*(.*?): (.*?)(?:;|$)/g }; } $_ = \%by_file; } print Dumper \%sections; __DATA__ Process Report File Start date & time: 6/4/2007 6:08:56 AM Process Settings: ------------------------------ => Setting 1 => Setting 2 => Setting 3 => Setting 4 Report file: C:\logs\PSTMig_20070604060856.txt Files to be processed: ------------------------------------------------------ PST: G:\#HOU- Template Boston ATPW_TEMPLATEBOSTON.pst --> Archive: JournHist01; Retention Category: AIM PST: G:\#HOU- Template New York ATPW_TEMPLATENEWYORKATPW.pst --> Archive: JournHist01; Retention Category: AIM PST: G:\#PWM- Exchange Test Account_EXCHANGETESTACCOUNT.pst --> Archive: JournHist01; Retention Category: AIM End time: 7:19:27 AM Report file: C:\Program Files\Enterprise Vault\Reports \PSTMig_20070604060856.txt Some of the PSTs contained items that were not eligible for archive. These items have not been archived and remain in the PSTs. See the Help for information on how to migrate these items. PSTs containing ineligible items: --------------------------------------------------------------------------------------------------- PST: G:\#HOU- Template Boston ATPW_TEMPLATEBOSTON.pst --> Vault: JournHist01 PST: G:\#PWM- PF Admin_PWMPFADMIN.pst --> Vault: JournHist01 PST: G:\698\ANTIGEN_BOSTON_ANTIGEN_BOSTON.pst --> Vault: JournHist01 PSTs successfully processed: -------------------------------------------- PST: G:\Carpenter, Jeffrey_JEFFREYC.pst --> Vault: JournHist01 (1 items archived) PST: G:\DBAGENT - Boston_DBAGENTA.pst --> Vault: JournHist01 (0 items archived) PST: G:\Demo Account_DACCOUNT.pst --> Vault: JournHist01 (0 items archived) PSTs partially processed: --------------------------------------------------------------- PST: G:\#HOU- Template Boston ATPW_TEMPLATEBOSTON.pst --> Vault: JournHist01 Items archived: 0 (of 1) Items failed archiving: 0 Items not eligible for archiving: 1 PST: G:\#PWM- PF Admin_PWMPFADMIN.pst --> Vault: JournHist01 Items archived: 0 (of 41) Items failed archiving: 0 Items not eligible for archiving: 41 PST: G:\ANTIGEN_BOSTON_ANTIGEN_BOSTON.pst --> Vault: JournHist01 Items archived: 474 (of 614) Items failed archiving: 0 Items not eligible for archiving: 140 -- To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED] http://learn.perl.org/