Thank Pauline.

I manually checked a couple of chromosomes and found out that you have
updated the files chr*.fa.gz at
ftp://hgdownload.cse.ucsc.edu/goldenPath/rn4/chromosomes, though the
date was not updated to July 2012. These fa files are now the same as
the those from rn4.2bit file or  from chromFa.tar.gz at
ftp://hgdownload.cse.ucsc.edu/goldenPath/rn4/bigZips.

Best,

Yongchao

On Fri, Jul 6, 2012 at 6:01 PM, Pauline Fujita <[email protected]> wrote:
> Hello Yongchao,
>
> Thank you for reporting this issue. You are correct that these should match
> and we have updated the relevant files accordingly. Please do not hesitate
> to contact us again if you are still seeing discrepancies.
>
>
> Best regards,
>
> Pauline Fujita
> UCSC Genome Bioinformatics Group
> http://genome.ucsc.edu
>
>
>
>
> On 7/3/12 7:35 AM, Yongchao Ge wrote:
>>
>> Hi,
>>
>> I am working on the sequence data for the rn4. There seems three ways
>> to access the sequence data with the following options
>>
>> 1. chr*.fa.gz:
>> ftp://hgdownload.cse.ucsc.edu/goldenPath/rn4/chromosomes/chr*.fa.gz
>> 2. chromFa.tar.gz:
>> ftp://hgdownload.cse.ucsc.edu/goldenPath/rn4/bigZips/chromFa.tar.gz
>> 3. rn4.2bit: ftp://hgdownload.cse.ucsc.edu/goldenPath/rn4/bigZips/rn4.2bit
>> and then use the twoBitToFa command to convert the data into fasta
>> format.
>>
>> Options 2 (chromFa.tag.gz) and 3 (rn4.bit) give identical sequences
>> for chr1. However, there are differences between options 1
>> (chr1.fa.gz) and 2 (chromFa.tar.gz).  On my Linux computer, the diff
>> command to compare the two files can be seen at the end of the email,
>>
>> My understanding is that for both files should be identical as
>> "Repeats from RepeatMasker and Tandem Repeats Finder (with period
>>      of 12 or less) are shown in lower case; non-repeating sequence is
>>      shown in upper case."
>>
>> My questions are, what caused the difference between the two files,
>> was it possibly caused by different version of RepeatMasker or Tandem
>> Repeats Finder or different parameters setting in those two softwares?
>> and which file should I use in extracting the sequence?
>>
>> Thanks,
>>
>> Yongchao
>>
>>
>>
>> -------------------------------------------------------------------------------------------------------
>> chr1.fa is the unzipped file chr1.fa.gz (option 1) and
>> chromFa/1/chr1.fa is extracted from the file chromFa.tar.gz (option
>> 2).
>>
>> $ diff chr1.fa chromFa/1/chr1.fa |less
>> 342,343c342,343
>> < ACTGCCTAAAGCAATACTAATTAGTAAGTTTTGGTGGCAAATGAGCTCTC
>> < AGAAGCCTAAACATAttgagaacaggcaatctccattaatgggaggttgc
>> ---
>>>
>>> ACTGCCTAAAGCAATACTAATTAGTAAGTTTTGGTGGCAAATGAGCTCTc
>>> agaagcctaaacatattgagaacaggcaatctccattaatgggaggttgc
>>
>> 385,386c385,386
>> < AGCATATCCAAGATATTGTACTGTTTAATTTTTATCACCTTGATAAAATT
>> < AGAACCATTTGAGAGAAGGAAATGAGAACATGAGTTTAAGGGCCTTCTTT
>> ---
>>>
>>> AGCATATCCAAGATATtgtactgtttaatttttatcaccttgataaaatt
>>> agaaccatttgagagaaggaaaTGAGAACATGAGTTTAAGGGCCTTCTTT
>>
>> 653,654c653,654
>> < acagtcaatgtctggcactgtggtatcccaaatatctgctagatatcttA
>> < AGTTtcatagcactgagtgcctccacaataaaacaggagatagcatgcat
>> ---
>>>
>>> acagtcaatgtctggcactgtggtatcccaaatatctgctagatatctta
>>> agtttcatagcactgagtgcctcCACAATAaaacaggagatagcatgcat
>>
>> _______________________________________________
>> Genome maillist  -  [email protected]
>> https://lists.soe.ucsc.edu/mailman/listinfo/genome
>
>
>
_______________________________________________
Genome maillist  -  [email protected]
https://lists.soe.ucsc.edu/mailman/listinfo/genome

Reply via email to