Hello Yongchao, Thank you for reporting this issue. You are correct that these should match and we have updated the relevant files accordingly. Please do not hesitate to contact us again if you are still seeing discrepancies.
Best regards, Pauline Fujita UCSC Genome Bioinformatics Group http://genome.ucsc.edu On 7/3/12 7:35 AM, Yongchao Ge wrote: > Hi, > > I am working on the sequence data for the rn4. There seems three ways > to access the sequence data with the following options > > 1. chr*.fa.gz: > ftp://hgdownload.cse.ucsc.edu/goldenPath/rn4/chromosomes/chr*.fa.gz > 2. chromFa.tar.gz: > ftp://hgdownload.cse.ucsc.edu/goldenPath/rn4/bigZips/chromFa.tar.gz > 3. rn4.2bit: ftp://hgdownload.cse.ucsc.edu/goldenPath/rn4/bigZips/rn4.2bit > and then use the twoBitToFa command to convert the data into fasta > format. > > Options 2 (chromFa.tag.gz) and 3 (rn4.bit) give identical sequences > for chr1. However, there are differences between options 1 > (chr1.fa.gz) and 2 (chromFa.tar.gz). On my Linux computer, the diff > command to compare the two files can be seen at the end of the email, > > My understanding is that for both files should be identical as > "Repeats from RepeatMasker and Tandem Repeats Finder (with period > of 12 or less) are shown in lower case; non-repeating sequence is > shown in upper case." > > My questions are, what caused the difference between the two files, > was it possibly caused by different version of RepeatMasker or Tandem > Repeats Finder or different parameters setting in those two softwares? > and which file should I use in extracting the sequence? > > Thanks, > > Yongchao > > > ------------------------------------------------------------------------------------------------------- > chr1.fa is the unzipped file chr1.fa.gz (option 1) and > chromFa/1/chr1.fa is extracted from the file chromFa.tar.gz (option > 2). > > $ diff chr1.fa chromFa/1/chr1.fa |less > 342,343c342,343 > < ACTGCCTAAAGCAATACTAATTAGTAAGTTTTGGTGGCAAATGAGCTCTC > < AGAAGCCTAAACATAttgagaacaggcaatctccattaatgggaggttgc > --- >> ACTGCCTAAAGCAATACTAATTAGTAAGTTTTGGTGGCAAATGAGCTCTc >> agaagcctaaacatattgagaacaggcaatctccattaatgggaggttgc > 385,386c385,386 > < AGCATATCCAAGATATTGTACTGTTTAATTTTTATCACCTTGATAAAATT > < AGAACCATTTGAGAGAAGGAAATGAGAACATGAGTTTAAGGGCCTTCTTT > --- >> AGCATATCCAAGATATtgtactgtttaatttttatcaccttgataaaatt >> agaaccatttgagagaaggaaaTGAGAACATGAGTTTAAGGGCCTTCTTT > 653,654c653,654 > < acagtcaatgtctggcactgtggtatcccaaatatctgctagatatcttA > < AGTTtcatagcactgagtgcctccacaataaaacaggagatagcatgcat > --- >> acagtcaatgtctggcactgtggtatcccaaatatctgctagatatctta >> agtttcatagcactgagtgcctcCACAATAaaacaggagatagcatgcat > _______________________________________________ > Genome maillist - [email protected] > https://lists.soe.ucsc.edu/mailman/listinfo/genome _______________________________________________ Genome maillist - [email protected] https://lists.soe.ucsc.edu/mailman/listinfo/genome
