Hi Nathalie,

On Wed, 12 Oct 2011 15:01:15 +0100
Nathalie Conte <n...@sanger.ac.uk> wrote:

> HI All,
> I have 2 sets of files I want to compare,and I don't know where to start 
> to get what I want :(
> I have a reference file ( see ref for example) with a chromosome name, a 
> start and a end position
> Chr7    115249090    115859515
> Chr8    25255496    29565459
> Chr13    198276698    298299815
> ChrX    109100951    109130998
> 
> 
> and I have a file (file_test) file I want to parse against this 
> reference ref.txt
> Chr1    115249098  
> Chr1    1362705  
> Chr8    25255996  
> Chr8    1362714  
> Chr1    1362735  
> ChrX    109100997   
> 
> So if the position on the file_test is found in ref_file it is kept in a 
> new file, if not discarded.

What I would do is construct a large array of the ranges where the indices can
be found (using start/end), while merging overlapping ranges, and then sort it
to have a sorted array of ([$start1,$end1],[$start2, $end2]...) ranges.

Then I will lookup these points in the array using binary search:

* http://search.cpan.org/dist/Search-Binary/

* 
http://search.cpan.org/~stevan/Tree-Binary-0.07/lib/Tree/Binary/Search.pm#OTHER_TREE_MODULES

Regards,

        Shlomi Fish

> 
> I am looking for advises /modules I could use to compare those 2 files .
> many thanks in advance for any tips
> Nat
> 
> 



-- 
-----------------------------------------------------------------
Shlomi Fish       http://www.shlomifish.org/
Stop Using MSIE - http://www.shlomifish.org/no-ie/

Larry Wall is lazy, impatient and full of hubris.

Please reply to list if it's a mailing list post - http://shlom.in/reply .

-- 
To unsubscribe, e-mail: beginners-unsubscr...@perl.org
For additional commands, e-mail: beginners-h...@perl.org
http://learn.perl.org/


Reply via email to