HI Shlomi, Thanks for your comments about best practise which I have implemented, Any ideas on why my hash of arrays of arrays is misbehaving? Thanks Nat
On 1 Jul 2015, at 15:42, Shlomi Fish <shlo...@shlomifish.org> wrote: > Hi Nat, > > some comments about your code. > > On Wed, 1 Jul 2015 13:00:53 +0100 > nco...@ebi.ac.uk wrote: > >> Hi, >> I need some help with a hash of array of array. >> this is my input data structure: >> gene a al1 data1 data2 data9 >> gene b al2 data3 data4 data10 >> gene b al3 data5 data6 data12 >> gene b al4 data7 data8 data12 >> >> I take each data variable, see above, from a sql query and parse the data >> to build a new data structure: a hash of arrays of arrays. >> In the input data presented here, the first column will be the key of the >> hash and the other 4 columns should compose 4 arrays. >> example :Each gene (gene a gene b ..) should be the keys, the al column >> should be the first array, data1,3,5,7 should be in the second array and >> so on for the third and fourth array. further more the data in the arrays >> should be unique. Data should be grouped depending on their keys, I am >> expecting this structure below, furthermore each variable in an array >> should be unique. ex: for gene b key in the 4th array data12 should appear >> once. >> >> Thanks for any tips. >> Nat >> >> this is what I would like to achieve (dataDumper format) >> >> $VAR1 = { >> gene b => [ > > Note that you need to quote the hash key if it contains whitespace: > > 'gene b' => [ > >> [ >> 'al4','al2','al3' >> ], >> [ >> 'data7','data3','data5' >> ], >> [ >> 'data8','data4','data6' >> ], >> [ >> 'data12','data10' >> ] >> ], >> >> gene a => [ >> ['al1'], >> ['data1'], >> ['data2'], >> ['data9'] >> ] >> }; >> >> >> using the script below and the data dumper, I create a structure which is >> not correct some of the data is erased and other grouped wrongly, 'gene a' >> data goes into 'gene b' . gene a is empty. >> >> $VAR1 = { >> gene b => [ >> [ >> 'al1', >> 'al2', >> 'al3' >> ], >> [ >> 'data1', >> 'data3', >> 'data5' >> ], >> [ >> 'data2', >> 'data4', >> 'data6' >> ], >> [ >> 'data9', >> 'data10', >> 'data12' >> ] >> ], >> gene a => [ >> [], >> [], >> [], >> [] >> ] >> }; >> >> >> >> here is my code >> >> #!/usr/local/bin/perl >> use strict; >> use warnings; > > It's good that you are using strict and warnings. > >> use DBI; >> use Data::Dumper; >> use List::MoreUtils qw(uniq); >> >> >> >> my $subrow_hash; >> my $row_hash; >> my %hasharray; > > > You should not predeclare your variables: > > http://perl-begin.org/tutorials/bad-elements/#declaring_all_vars_at_top > >> >> >> my $geno_dbh = DBI->connect( credential...} ) || die "Database >> connection not made: $DBI::errstr"; >> print STDERR "Connection...\n"; >> >> >> my $subsql = "SELECT * FROM table_2015"; >> #this is the table structure >> #gene a al1 data1 data2 data9 >> #gene b al2 data3 data4 data10 >> #gene b al3 data5 data6 data12 >> #gene b al4 data7 data8 data12 >> > > In general "SELECT *" is not recommended and you should list your fields > explicitly. > >> >> my $subresult = $geno_dbh->prepare($subsql); >> $subresult->execute() or die "SQL Error: $DBI::errstr\n"; >> >> my @gene_name_list; >> my @allele_list; >> my @mp_list; >> my @mp_list_def; >> my @unique_mp_def; >> my $Xref; >> my @unique_gene_name_list ; # unique gene only >> my @unique_allele_list;# unique allele only >> my @unique_mp_list;# unique MP terms only >> my $list; > > More predeclarations. > >> while ( $subrow_hash = $subresult->fetchrow_hashref) { >> > > Better have a "my $subrow_hash" here. > >> my $symbol_id=$subrow_hash->{symbol};#this is the first set of data >> for the first array > > 1. The line is too long. > > 2. There are no spaces around the equal sign "=". > > 3. Perhaps make it an array. > >> >> my $allele_id=$subrow_hash->{allele_symbol};#this is the 2nd set of >> data for the 2nd array >> >> my $mp_id=$subrow_hash->{phenotype_acc};#this is the 3rd set of data >> for the 3rd array >> >> my $mp_def=$subrow_hash->{name};#this is the fourth set of data for >> the fourth array >> >> $Xref=$subrow_hash->{xref_acc}; #this is the key of the hash (gene a >> and gene b) >> >> >> $list=[[@gene_name_list], [@allele_list], [@mp_list], >> [@mp_list_def]];#create arrays of arrays >> > > $list here accumulates several lists. > > >> if ($Xref){ >> >> >> $hasharray{$Xref} = $list; #create a hash of arrays of arrays >> for a specific key >> @gene_name_list =@{$list->[0]}; #maybe not necessary to >> declare this. @allele_list =@{$list->[1]}; >> @mp_list =@{$list->[2]}; >> @mp_list_def =@{$list->[3]}; >> > > Why is the assignment to the arrays again necessary. > >> >> >> if ($symbol_id){ > > It's better to use defined here: > > http://perldoc.perl.org/functions/defined.html > >> push (@gene_name_list, $symbol_id); #fill arrays with data >> } >> if ($allele_id){ >> push (@allele_list, $allele_id); >> } >> if ($mp_id){ >> push (@mp_list, $mp_id); >> } >> if ($mp_def){ >> push (@mp_list_def, $mp_def); >> } >> } >> >> > > Perhaps you wish to peruse the resources in: > > http://perl-begin.org/topics/references/ > > Regards, > > Shlomi Fish > > > -- > ----------------------------------------------------------------- > Shlomi Fish http://www.shlomifish.org/ > List of Text Editors and IDEs - http://shlom.in/IDEs > > Chuck Norris can read Perl code that was RSA encrypted. > — http://www.shlomifish.org/humour/bits/facts/Chuck-Norris/ > > Please reply to list if it's a mailing list post - http://shlom.in/reply . -- To unsubscribe, e-mail: beginners-unsubscr...@perl.org For additional commands, e-mail: beginners-h...@perl.org http://learn.perl.org/