Hi,

I created a plug-in for merging spots a while ago (see attachment) that might be useful for you. It is written in perl and it merges all duplicate spots on each individual array by calculating the geometric mean intensities for each reporter. The merged intesities for a specific reporter are stored at the same position for all arrays in the resulting BioAssay.

Regards,
Adam Ameur

PhD Student,
The Linnaeus Centre for Bioinformatics
Uppsala University, Sweden


charles girardot wrote:

Hi there,

I wonder if somebody has a plugin that merges data sets (BioAssay) from different array design together. Let me be a bit more specific. We have array designs (print ourselves) that are very similar but yet a bit different (eg the order in which plates have been printed, or some plates are missing in one design). If I am correct, BASE matches up spots based on their position when e.g plotting scatter plots to compare signals in hyb 1to those in hyb 2. When I created the array design, I made sure that the features point to the same reporters. What we d like to do is to compare results obtained for the same reporter on the different designs. Note that here reporter = probe, ie we want to compare stuff that are really the same, just spotted at different places.
I was thinking about a plug-in that would :
1. pick arbitrarily one of the 2 array design as a reference
2. convert the other array design to this reference
3. Only spots (based on reporter id matching) that are common would be kept, the other ones would be filtered out 4. The resulting value of merged spots would be customizable, e.g the mean, or the highest or whatever
5. Keeps relation to parent BioAssay

Or something similar.

Any suggestions?

Cheers

Charles



-------------------------------------------------------
Using Tomcat but need to do more? Need to support web services, security?
Get stuff done quickly with pre-integrated technology to make your job easier Download IBM WebSphere Application Server v.1.0.1 based on Apache Geronimo
http://sel.as-us.falkag.net/sel?cmd=lnk&kid=120709&bid=263057&dat=121642
_______________________________________________
The BASE general discussion mailing list
basedb-users@lists.sourceforge.net
unsubscribe: send a mail with subject "unsubscribe" to
[EMAIL PROTECTED]


#!/usr/local/bin/perl -w

#######################################################################
#
#  File: merge_spots.pl
#
#  Creator: Adam Ameur, Linnaeus Centre for Bioinformatics, 7/4 2004
#
#  This plug-in merges all replicate spots on each array by computing 
#  the geometric mean of intensity1 and intensity2. All spots that 
#  have the same reporter id are considered to be replicates. The 
#  result is always stored at a new position.
# 
#  Indata to this plugin is a serial BASEfile with columns: position,
#  reporter, intensity1 and intensity2. 
#
#  Changed 6/4 2005: Geometric mean is calculated using log and exp.
######################################################################


sub merge_spots;

# Reporter -> New position number
my %position_numbers;
my $current_position_number=1;

MAIN:
{
    
    my $status = merge_spots(STDIN);

    if ($status == -1){
        die "Function merge_spots.pl failed.";
    }

}


# Takes a string with comma separated numbers as argument
# and returns the geometric mean of the numbers. Returns 
# -1 if no numbers are found.
# 
# Changed 5/4 2004. Geometric mean is calculated using log and exp.
sub geometric_mean{
    my $intensity_str = $_[0];

    my @values = ($intensity_str =~ /([^\,]+)/g);

    my $geom_mean;
    my $nr_values;

    for $value (@values){
        if($value){
            if(!$geom_mean){
                $geom_mean= log $value;
                $nr_values=1;
            }
            else{
                $geom_mean=$geom_mean + (log $value);
                $nr_values++;
            }
        }
    }

    if($nr_values > 0){
        $geom_mean = $geom_mean*(1/$nr_values);
        $geom_mean = exp $geom_mean;
    }
    else{
        return -1;
    }

    return $geom_mean;

}

##########################################  
#
# Functions for merging a serial BASEfile
#
##########################################

# This function merges the intensity values for all data in section
# spots on reporter id. The resulting intensity values is computed
# from the geometric mean of intensities for the duplicate spots.
# The resulting position is chosen to be the 'smallest' position.
sub merge_spots_for_assay{
    my $stream = $_[0];
    my $line = $_[1];
    my $out_stream = $_[2];

    # The section spot header is printed directly to stdout.txt
    print $out_stream $line;

    while(!($line =~ /\%/)){
        $line = <$stream>;
        print $out_stream $line;
    }

    # Store all data in section spots in the mappings below:

    my %reporter_ids;  # Reporter -> {"Present"}
    my %intensity1;    # Reporter -> Intensity1 values
    my %intensity2;    # Reporter -> Intensity2 values

    while(!($line =~ /^(\s+)$/)){

        if ($line =~ /(\S+)\s+(\S+)\s+(\S+)\s+(\S+)$/){
            my $position = $1;
            my $reporter_id = $2;
            my $int1 = $3;
            my $int2 = $4;

            $reporter_ids{$reporter_id}="Present";
            
            if(!$position_number{$reporter_id}){
                $position_number{$reporter_id}=$current_position_number;
                $current_position_number++;
            }

            if(!$intensity1{$reporter_id}){
                $intensity1{$reporter_id}=$int1;
            }
            else{
                $intensity1{$reporter_id}=$intensity1{$reporter_id}.",".$int1;
            }

            if(!$intensity2{$reporter_id}){
                $intensity2{$reporter_id}=$int2;
            }
            else{
                $intensity2{$reporter_id}=$intensity2{$reporter_id}.",".$int2;
            }

        }

        $line = <$stream>;
    }

    # Go through all mappings and print the merged data
    my @rep_ids = sort {$a <=> $b} keys %reporter_ids;

    for my $rep_id (@rep_ids){

        my $intensity1_str = $intensity1{$rep_id};
        my $intensity2_str = $intensity2{$rep_id};

        # Compute new values for position and intensities
        my $new_pos = $position_number{$rep_id};

        my $new_int1 = geometric_mean($intensity1_str);
        my $new_int2 = geometric_mean($intensity2_str);

        if($new_pos == -1 || $new_int1 == -1 || $new_int2 == -1){
            return -1;
        }

        # Print the merged information
        print $out_stream 
$new_pos."\t".$rep_id."\t".$new_int1."\t".$new_int2."\n";

    }

    print $out_stream "\n";

    return 0;
}




# Function for merging all spots in a serial BASEfile, 
# and writing the output to the file stdout.txt
sub merge_spots{
     my $stream = $_[0];
    
     my $outfile = "stdout.txt";
     open(OUT_FILE, ">".$outfile) or die "Can't open ".$outfile."\n";

     print OUT_FILE "BASEfile\n";
     
     # Loop through file
     while(!eof($stream)){
     
         my $line = <$stream>;
         
         # Merge all data in a spots section
         if(($line =~ /section\s+spots/)){
                my $status = merge_spots_for_assay($stream, $line, OUT_FILE);

                if($status == -1){
                    return -1;
                }

            }

     }

     close(OUT_FILE);
     
     return 0;
}

Reply via email to