Hi Nathalie, On Wed, 12 Oct 2011 15:01:15 +0100 Nathalie Conte <n...@sanger.ac.uk> wrote:
> HI All, > I have 2 sets of files I want to compare,and I don't know where to start > to get what I want :( > I have a reference file ( see ref for example) with a chromosome name, a > start and a end position > Chr7 115249090 115859515 > Chr8 25255496 29565459 > Chr13 198276698 298299815 > ChrX 109100951 109130998 > > > and I have a file (file_test) file I want to parse against this > reference ref.txt > Chr1 115249098 > Chr1 1362705 > Chr8 25255996 > Chr8 1362714 > Chr1 1362735 > ChrX 109100997 > > So if the position on the file_test is found in ref_file it is kept in a > new file, if not discarded. What I would do is construct a large array of the ranges where the indices can be found (using start/end), while merging overlapping ranges, and then sort it to have a sorted array of ([$start1,$end1],[$start2, $end2]...) ranges. Then I will lookup these points in the array using binary search: * http://search.cpan.org/dist/Search-Binary/ * http://search.cpan.org/~stevan/Tree-Binary-0.07/lib/Tree/Binary/Search.pm#OTHER_TREE_MODULES Regards, Shlomi Fish > > I am looking for advises /modules I could use to compare those 2 files . > many thanks in advance for any tips > Nat > > -- ----------------------------------------------------------------- Shlomi Fish http://www.shlomifish.org/ Stop Using MSIE - http://www.shlomifish.org/no-ie/ Larry Wall is lazy, impatient and full of hubris. Please reply to list if it's a mailing list post - http://shlom.in/reply . -- To unsubscribe, e-mail: beginners-unsubscr...@perl.org For additional commands, e-mail: beginners-h...@perl.org http://learn.perl.org/