Re: [Ilugc] gawk query - csv processing. moving away from MS-Access

Raman.P Tue, 03 Nov 2009 02:29:54 -0800

--- On Tue, 3/11/09, Siva Subramanian <[email protected]> wrote:


> I am into my 1st BSc Statistics and into a project for an
> MNC. I am
> trying my first hand at linux and was recently introduced
> to gawk.
> 
> 
> 
> I am having trouble processing a statistical dump that is
> provided to us in the
> form of a csv file. The format of the file is given below
> 
> 
> 
> C_ID, ID_NO, stat1, vol2, amount3,... 
> 
> 
> 
> There are around 40 fields and the csv file has close to a
> million records
> 
> 
> 
> The C_ID, is the customer id and is only way to identify
> the customer.
> 
> The ID_NO field is the premium plan the customer is in
> 
> stat1, vol2, amounts are all numbers
> 
> 
> 
> I can write a query that uses a few if statements and gets
> the details if the
> ID_NO is sequential. 
> 
> 1.&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;
> However,
> there are over 1000 different ID_NO (legacy) and we need to
> add the stat1, vol2,
> amount3 for each of the ID_NOs separately (I need to group
> the ID_NO, and sum
> of the fields)
> 
> 2.&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;
> If
> I have the C_ID of the customers in a separate csv, is it
> possible to compare
> the C_ID with that of the C_ID in the dump and determine
> the sum of stat1,
> vol2, amount3…&nbsp; (sum of the fields only
> for a set of customers and not for the ID_NOs in whole)
> 
> We are extensively using
> MS-Access for this and it has been a pain. A friend
> suggested that I try my
> hand at using tools in linux.
> 
> I am not sure if this is the
> right mailing list for this. 
> 
> Will really appreciate any help
> in this regard.

I cannot help on gawk. However perl has nice dbi interface to csv files.

here is the standard example for reading /etc/passwd using sql

require DBI;
           my $dbh = DBI->connect("DBI:CSV:");
           $dbh->{'csv_tables'}->{'passwd'} = {
               'eol' => "\n",
               'sep_char' => ":",
               'quote_char' => undef,
               'escape_char' => undef,
               'file' => '/etc/passwd',
               'col_names' => ["login", "password", "uid", "gid", "realname",
                               "directory", "shell"]
           };
           $sth = $dbh->prepare("SELECT * FROM passwd where login like 'ram%'");
$sth->execute();
while (@array=$sth->fetchrow_array()){
        print "@array\n\n";
}

There are specialised modules for csv at cpan.org. Million records should not 
be a problem at all.

Raman.P
blog:http://ramanchennai.wordpress.com/



      Keep up with people you care about with Yahoo! India Mail. Learn how. 
http://in.overview.mail.yahoo.com/connectmore
_______________________________________________
To unsubscribe, email [email protected] with
"unsubscribe <password> <address>"
in the subject or body of the message.
http://www.ae.iitm.ac.in/mailman/listinfo/ilugc

Re: [Ilugc] gawk query - csv processing. moving away from MS-Access

Reply via email to