Hello Yongchao,

Thank you for reporting this issue. You are correct that these should 
match and we have updated the relevant files accordingly. Please do not 
hesitate to contact us again if you are still seeing discrepancies.


Best regards,

Pauline Fujita
UCSC Genome Bioinformatics Group
http://genome.ucsc.edu



On 7/3/12 7:35 AM, Yongchao Ge wrote:
> Hi,
>
> I am working on the sequence data for the rn4. There seems three ways
> to access the sequence data with the following options
>
> 1. chr*.fa.gz:
> ftp://hgdownload.cse.ucsc.edu/goldenPath/rn4/chromosomes/chr*.fa.gz
> 2. chromFa.tar.gz:
> ftp://hgdownload.cse.ucsc.edu/goldenPath/rn4/bigZips/chromFa.tar.gz
> 3. rn4.2bit: ftp://hgdownload.cse.ucsc.edu/goldenPath/rn4/bigZips/rn4.2bit
> and then use the twoBitToFa command to convert the data into fasta
> format.
>
> Options 2 (chromFa.tag.gz) and 3 (rn4.bit) give identical sequences
> for chr1. However, there are differences between options 1
> (chr1.fa.gz) and 2 (chromFa.tar.gz).  On my Linux computer, the diff
> command to compare the two files can be seen at the end of the email,
>
> My understanding is that for both files should be identical as
> "Repeats from RepeatMasker and Tandem Repeats Finder (with period
>      of 12 or less) are shown in lower case; non-repeating sequence is
>      shown in upper case."
>
> My questions are, what caused the difference between the two files,
> was it possibly caused by different version of RepeatMasker or Tandem
> Repeats Finder or different parameters setting in those two softwares?
> and which file should I use in extracting the sequence?
>
> Thanks,
>
> Yongchao
>
>
> -------------------------------------------------------------------------------------------------------
> chr1.fa is the unzipped file chr1.fa.gz (option 1) and
> chromFa/1/chr1.fa is extracted from the file chromFa.tar.gz (option
> 2).
>
> $ diff chr1.fa chromFa/1/chr1.fa |less
> 342,343c342,343
> < ACTGCCTAAAGCAATACTAATTAGTAAGTTTTGGTGGCAAATGAGCTCTC
> < AGAAGCCTAAACATAttgagaacaggcaatctccattaatgggaggttgc
> ---
>> ACTGCCTAAAGCAATACTAATTAGTAAGTTTTGGTGGCAAATGAGCTCTc
>> agaagcctaaacatattgagaacaggcaatctccattaatgggaggttgc
> 385,386c385,386
> < AGCATATCCAAGATATTGTACTGTTTAATTTTTATCACCTTGATAAAATT
> < AGAACCATTTGAGAGAAGGAAATGAGAACATGAGTTTAAGGGCCTTCTTT
> ---
>> AGCATATCCAAGATATtgtactgtttaatttttatcaccttgataaaatt
>> agaaccatttgagagaaggaaaTGAGAACATGAGTTTAAGGGCCTTCTTT
> 653,654c653,654
> < acagtcaatgtctggcactgtggtatcccaaatatctgctagatatcttA
> < AGTTtcatagcactgagtgcctccacaataaaacaggagatagcatgcat
> ---
>> acagtcaatgtctggcactgtggtatcccaaatatctgctagatatctta
>> agtttcatagcactgagtgcctcCACAATAaaacaggagatagcatgcat
> _______________________________________________
> Genome maillist  -  [email protected]
> https://lists.soe.ucsc.edu/mailman/listinfo/genome


_______________________________________________
Genome maillist  -  [email protected]
https://lists.soe.ucsc.edu/mailman/listinfo/genome

Reply via email to