--- On Tue, 3/11/09, Siva Subramanian <[email protected]> wrote:
> I am into my 1st BSc Statistics and into a project for an
> MNC. I am
> trying my first hand at linux and was recently introduced
> to gawk.
>
>
>
> I am having trouble processing a statistical dump that is
> provided to us in the
> form of a csv file. The format of the file is given below
>
>
>
> C_ID, ID_NO, stat1, vol2, amount3,...
>
>
>
> There are around 40 fields and the csv file has close to a
> million records
>
>
>
> The C_ID, is the customer id and is only way to identify
> the customer.
>
> The ID_NO field is the premium plan the customer is in
>
> stat1, vol2, amounts are all numbers
>
>
>
> I can write a query that uses a few if statements and gets
> the details if the
> ID_NO is sequential.
>
> 1.
> However,
> there are over 1000 different ID_NO (legacy) and we need to
> add the stat1, vol2,
> amount3 for each of the ID_NOs separately (I need to group
> the ID_NO, and sum
> of the fields)
>
> 2.
> If
> I have the C_ID of the customers in a separate csv, is it
> possible to compare
> the C_ID with that of the C_ID in the dump and determine
> the sum of stat1,
> vol2, amount3… (sum of the fields only
> for a set of customers and not for the ID_NOs in whole)
>
> We are extensively using
> MS-Access for this and it has been a pain. A friend
> suggested that I try my
> hand at using tools in linux.
>
> I am not sure if this is the
> right mailing list for this.
>
> Will really appreciate any help
> in this regard.
I cannot help on gawk. However perl has nice dbi interface to csv files.
here is the standard example for reading /etc/passwd using sql
require DBI;
my $dbh = DBI->connect("DBI:CSV:");
$dbh->{'csv_tables'}->{'passwd'} = {
'eol' => "\n",
'sep_char' => ":",
'quote_char' => undef,
'escape_char' => undef,
'file' => '/etc/passwd',
'col_names' => ["login", "password", "uid", "gid", "realname",
"directory", "shell"]
};
$sth = $dbh->prepare("SELECT * FROM passwd where login like 'ram%'");
$sth->execute();
while (@array=$sth->fetchrow_array()){
print "@array\n\n";
}
There are specialised modules for csv at cpan.org. Million records should not
be a problem at all.
Raman.P
blog:http://ramanchennai.wordpress.com/
Keep up with people you care about with Yahoo! India Mail. Learn how.
http://in.overview.mail.yahoo.com/connectmore
_______________________________________________
To unsubscribe, email [email protected] with
"unsubscribe <password> <address>"
in the subject or body of the message.
http://www.ae.iitm.ac.in/mailman/listinfo/ilugc