Hi Ben,

Unfortunately, the Table Browser is not useful for performing 
genome-wide queries on assemblies with a large number of scaffolds, like 
felCat4. Two possible work arounds suggested by our developers:

1) First Use the fetchChromSizes script (described in 
http://genome.ucsc.edu/goldenPath/help/bigBed.html) and unix/gnu 
utilities to make a file containing chromosome regions for the first 40 
sequences from felCat4:

  fetchChromSizes felCat4 | head -40 | sed -re 
's/([^[:space:]]+)[[:space:]]+/\1:1-/' > felCat4Top40.txt

In the Table Browser, click the 'define regions' button and upload that 
file. Then proceed as usual.

This works by uploading a list of chromosome regions so the the table 
browser does a smaller number of mysql queries as it does for human, 
mouse, etc. Unfortunately, it will inevitably miss some items that are 
on smaller scaffolds, but at least items on assembled chromosomes and 
the largest scaffolds will be returned.

2) "Use the twoBitToFa and faRc programs as well as the felCat4.2bit 
file. First, fetch the coordinates of the genes from the refGene table 
(http://hgdownload.cse.ucsc.edu/goldenPath/felCat4/database/refGene.txt.gz). 
Next, use the bed* utilities to get the exon coordinates. Format these 
coordinates so they can be used with the -seqList option of twoBitToFa. 
You should separate this operation into two procedures, one for + strand 
items, one for - strand items. Fetch the sequence out of the 2bit file, 
use faRc for the - strand items."

I hope this information is helpful.  Please feel free to contact the 
mail list again if you require further assistance.

Best,
Mary
------------------
Mary Goldman
UCSC Bioinformatics Group

On 4/11/11 5:38 PM, Ben Neely wrote:
> Greg,
>
> Thanks a ton for those directions, but I have one issue: at the bottom of
> the output using the steps below (as well as selecting gzip) this is what I
> get
> carefulAlloc: Allocated too much memory - more than 6,442,450,941 bytes
> (6,442,456,810)
>
> Sorry to be such a burden, and hopefully this isn't too trivial. I actually
> think these are probably in the major databases considering there are 219
> entries in Swiss-Prot<http://www.uniprot.org/taxonomy/9685>, and 231 in
> Ensembl<http://useast.ensembl.org/Felis_catus/Info/StatsTable?db=core>.
> Still it would be nice to at least confirm we are working with the most up
> to date.
>
>
> Thanks again,
> Ben
>
> On Mon, Apr 11, 2011 at 8:09 PM, Greg Roe<[email protected]>  wrote:
>
>> Hi Benjamin,
>>
>> You can use the table browser to get the data. From the Genome Browser home
>> page, select Tables from the top menu and:
>>
>> Select (Clade/Genome/Assembly)
>> Mammal/Cat/felCat4 and:
>>
>> group: Gene and Gene prediction tracks
>> track: refSeq Genes (or one of the other gene sets)
>> table: refGene
>> region: genome (or select position to get a sequence for a specific region)
>> identifiers (names/accessions): if applicable, click on "paste list" and
>> paste in the identifiers following the instructions.
>> output format: sequence
>> Click get output
>>
>> Select sequence type: genomic
>> Click Submit
>>
>> On the sequence retrieval options page, uncheck all boxes except CDS Exons.
>> Then click Get Sequence.
>>
>> I'm not sure what version of the cat genome is currently hosted at Ensembl
>> or UniProt.  I do know Ensembl does host a cat genome. Our felCat4 =
>> NHGRI/GBT V17e.
>>
>> If you have any additional questions, feel free to contact us again at:
>> [email protected]
>>
>> -
>> Greg Roe
>> UCSC Genome Bioinformatics Group
>>
>>
>>
>>   -------- Original Message --------
>>> Subject:     Protein FASTA of GTB V17E assembly (December 2008, UCSC
>>> version felCat4)
>>> Date:     Mon, 11 Apr 2011 13:23:41 -0400
>>>
>>> I am currently doing some proteomic analysis of Felis catus samples and
>>> was making sure I had the most current protein database. Is the felCat4
>>> release included in public databases such as UniProt or Ensembl, and
>>> more specifically, do you have a CDS translated fasta of felCat4
>>> available? I have looked on various websites and couldn't find this,
>>> which is why I am emailing your group. If there is someone else I should
>>> ask about this, I appreciate any help in directing me to them.
>>>
>>> Thank you for your time,
>>> Benjamin Neely, Ph.D.
>>> Post-doc
>>> Nephrology Proteomics Laboratory
>>> Medical University of South Carolina
>>> Charleston, SC, USA
>>>
> _______________________________________________
> Genome maillist  -  [email protected]
> https://lists.soe.ucsc.edu/mailman/listinfo/genome
_______________________________________________
Genome maillist  -  [email protected]
https://lists.soe.ucsc.edu/mailman/listinfo/genome

Reply via email to