Hi,
I created a plug-in for merging spots a while ago (see attachment) that
might be useful for you. It is written in perl and it merges all
duplicate spots on each individual array by calculating the geometric
mean intensities for each reporter. The merged intesities for a specific
reporter are stored at the same position for all arrays in the resulting
BioAssay.
Regards,
Adam Ameur
PhD Student,
The Linnaeus Centre for Bioinformatics
Uppsala University, Sweden
charles girardot wrote:
Hi there,
I wonder if somebody has a plugin that merges data sets (BioAssay)
from different array design together. Let me be a bit more specific.
We have array designs (print ourselves) that are very similar but yet
a bit different (eg the order in which plates have been printed, or
some plates are missing in one design). If I am correct, BASE matches
up spots based on their position when e.g plotting scatter plots to
compare signals in hyb 1to those in hyb 2.
When I created the array design, I made sure that the features point
to the same reporters. What we d like to do is to compare results
obtained for the same reporter on the different designs. Note that
here reporter = probe, ie we want to compare stuff that are really the
same, just spotted at different places.
I was thinking about a plug-in that would :
1. pick arbitrarily one of the 2 array design as a reference
2. convert the other array design to this reference
3. Only spots (based on reporter id matching) that are common would be
kept, the other ones would be filtered out
4. The resulting value of merged spots would be customizable, e.g the
mean, or the highest or whatever
5. Keeps relation to parent BioAssay
Or something similar.
Any suggestions?
Cheers
Charles
-------------------------------------------------------
Using Tomcat but need to do more? Need to support web services, security?
Get stuff done quickly with pre-integrated technology to make your job
easier
Download IBM WebSphere Application Server v.1.0.1 based on Apache
Geronimo
http://sel.as-us.falkag.net/sel?cmd=lnk&kid=120709&bid=263057&dat=121642
_______________________________________________
The BASE general discussion mailing list
basedb-users@lists.sourceforge.net
unsubscribe: send a mail with subject "unsubscribe" to
[EMAIL PROTECTED]
#!/usr/local/bin/perl -w
#######################################################################
#
# File: merge_spots.pl
#
# Creator: Adam Ameur, Linnaeus Centre for Bioinformatics, 7/4 2004
#
# This plug-in merges all replicate spots on each array by computing
# the geometric mean of intensity1 and intensity2. All spots that
# have the same reporter id are considered to be replicates. The
# result is always stored at a new position.
#
# Indata to this plugin is a serial BASEfile with columns: position,
# reporter, intensity1 and intensity2.
#
# Changed 6/4 2005: Geometric mean is calculated using log and exp.
######################################################################
sub merge_spots;
# Reporter -> New position number
my %position_numbers;
my $current_position_number=1;
MAIN:
{
my $status = merge_spots(STDIN);
if ($status == -1){
die "Function merge_spots.pl failed.";
}
}
# Takes a string with comma separated numbers as argument
# and returns the geometric mean of the numbers. Returns
# -1 if no numbers are found.
#
# Changed 5/4 2004. Geometric mean is calculated using log and exp.
sub geometric_mean{
my $intensity_str = $_[0];
my @values = ($intensity_str =~ /([^\,]+)/g);
my $geom_mean;
my $nr_values;
for $value (@values){
if($value){
if(!$geom_mean){
$geom_mean= log $value;
$nr_values=1;
}
else{
$geom_mean=$geom_mean + (log $value);
$nr_values++;
}
}
}
if($nr_values > 0){
$geom_mean = $geom_mean*(1/$nr_values);
$geom_mean = exp $geom_mean;
}
else{
return -1;
}
return $geom_mean;
}
##########################################
#
# Functions for merging a serial BASEfile
#
##########################################
# This function merges the intensity values for all data in section
# spots on reporter id. The resulting intensity values is computed
# from the geometric mean of intensities for the duplicate spots.
# The resulting position is chosen to be the 'smallest' position.
sub merge_spots_for_assay{
my $stream = $_[0];
my $line = $_[1];
my $out_stream = $_[2];
# The section spot header is printed directly to stdout.txt
print $out_stream $line;
while(!($line =~ /\%/)){
$line = <$stream>;
print $out_stream $line;
}
# Store all data in section spots in the mappings below:
my %reporter_ids; # Reporter -> {"Present"}
my %intensity1; # Reporter -> Intensity1 values
my %intensity2; # Reporter -> Intensity2 values
while(!($line =~ /^(\s+)$/)){
if ($line =~ /(\S+)\s+(\S+)\s+(\S+)\s+(\S+)$/){
my $position = $1;
my $reporter_id = $2;
my $int1 = $3;
my $int2 = $4;
$reporter_ids{$reporter_id}="Present";
if(!$position_number{$reporter_id}){
$position_number{$reporter_id}=$current_position_number;
$current_position_number++;
}
if(!$intensity1{$reporter_id}){
$intensity1{$reporter_id}=$int1;
}
else{
$intensity1{$reporter_id}=$intensity1{$reporter_id}.",".$int1;
}
if(!$intensity2{$reporter_id}){
$intensity2{$reporter_id}=$int2;
}
else{
$intensity2{$reporter_id}=$intensity2{$reporter_id}.",".$int2;
}
}
$line = <$stream>;
}
# Go through all mappings and print the merged data
my @rep_ids = sort {$a <=> $b} keys %reporter_ids;
for my $rep_id (@rep_ids){
my $intensity1_str = $intensity1{$rep_id};
my $intensity2_str = $intensity2{$rep_id};
# Compute new values for position and intensities
my $new_pos = $position_number{$rep_id};
my $new_int1 = geometric_mean($intensity1_str);
my $new_int2 = geometric_mean($intensity2_str);
if($new_pos == -1 || $new_int1 == -1 || $new_int2 == -1){
return -1;
}
# Print the merged information
print $out_stream
$new_pos."\t".$rep_id."\t".$new_int1."\t".$new_int2."\n";
}
print $out_stream "\n";
return 0;
}
# Function for merging all spots in a serial BASEfile,
# and writing the output to the file stdout.txt
sub merge_spots{
my $stream = $_[0];
my $outfile = "stdout.txt";
open(OUT_FILE, ">".$outfile) or die "Can't open ".$outfile."\n";
print OUT_FILE "BASEfile\n";
# Loop through file
while(!eof($stream)){
my $line = <$stream>;
# Merge all data in a spots section
if(($line =~ /section\s+spots/)){
my $status = merge_spots_for_assay($stream, $line, OUT_FILE);
if($status == -1){
return -1;
}
}
}
close(OUT_FILE);
return 0;
}