Hi Ben, Unfortunately, the Table Browser is not useful for performing genome-wide queries on assemblies with a large number of scaffolds, like felCat4. Two possible work arounds suggested by our developers:
1) First Use the fetchChromSizes script (described in http://genome.ucsc.edu/goldenPath/help/bigBed.html) and unix/gnu utilities to make a file containing chromosome regions for the first 40 sequences from felCat4: fetchChromSizes felCat4 | head -40 | sed -re 's/([^[:space:]]+)[[:space:]]+/\1:1-/' > felCat4Top40.txt In the Table Browser, click the 'define regions' button and upload that file. Then proceed as usual. This works by uploading a list of chromosome regions so the the table browser does a smaller number of mysql queries as it does for human, mouse, etc. Unfortunately, it will inevitably miss some items that are on smaller scaffolds, but at least items on assembled chromosomes and the largest scaffolds will be returned. 2) "Use the twoBitToFa and faRc programs as well as the felCat4.2bit file. First, fetch the coordinates of the genes from the refGene table (http://hgdownload.cse.ucsc.edu/goldenPath/felCat4/database/refGene.txt.gz). Next, use the bed* utilities to get the exon coordinates. Format these coordinates so they can be used with the -seqList option of twoBitToFa. You should separate this operation into two procedures, one for + strand items, one for - strand items. Fetch the sequence out of the 2bit file, use faRc for the - strand items." I hope this information is helpful. Please feel free to contact the mail list again if you require further assistance. Best, Mary ------------------ Mary Goldman UCSC Bioinformatics Group On 4/11/11 5:38 PM, Ben Neely wrote: > Greg, > > Thanks a ton for those directions, but I have one issue: at the bottom of > the output using the steps below (as well as selecting gzip) this is what I > get > carefulAlloc: Allocated too much memory - more than 6,442,450,941 bytes > (6,442,456,810) > > Sorry to be such a burden, and hopefully this isn't too trivial. I actually > think these are probably in the major databases considering there are 219 > entries in Swiss-Prot<http://www.uniprot.org/taxonomy/9685>, and 231 in > Ensembl<http://useast.ensembl.org/Felis_catus/Info/StatsTable?db=core>. > Still it would be nice to at least confirm we are working with the most up > to date. > > > Thanks again, > Ben > > On Mon, Apr 11, 2011 at 8:09 PM, Greg Roe<[email protected]> wrote: > >> Hi Benjamin, >> >> You can use the table browser to get the data. From the Genome Browser home >> page, select Tables from the top menu and: >> >> Select (Clade/Genome/Assembly) >> Mammal/Cat/felCat4 and: >> >> group: Gene and Gene prediction tracks >> track: refSeq Genes (or one of the other gene sets) >> table: refGene >> region: genome (or select position to get a sequence for a specific region) >> identifiers (names/accessions): if applicable, click on "paste list" and >> paste in the identifiers following the instructions. >> output format: sequence >> Click get output >> >> Select sequence type: genomic >> Click Submit >> >> On the sequence retrieval options page, uncheck all boxes except CDS Exons. >> Then click Get Sequence. >> >> I'm not sure what version of the cat genome is currently hosted at Ensembl >> or UniProt. I do know Ensembl does host a cat genome. Our felCat4 = >> NHGRI/GBT V17e. >> >> If you have any additional questions, feel free to contact us again at: >> [email protected] >> >> - >> Greg Roe >> UCSC Genome Bioinformatics Group >> >> >> >> -------- Original Message -------- >>> Subject: Protein FASTA of GTB V17E assembly (December 2008, UCSC >>> version felCat4) >>> Date: Mon, 11 Apr 2011 13:23:41 -0400 >>> >>> I am currently doing some proteomic analysis of Felis catus samples and >>> was making sure I had the most current protein database. Is the felCat4 >>> release included in public databases such as UniProt or Ensembl, and >>> more specifically, do you have a CDS translated fasta of felCat4 >>> available? I have looked on various websites and couldn't find this, >>> which is why I am emailing your group. If there is someone else I should >>> ask about this, I appreciate any help in directing me to them. >>> >>> Thank you for your time, >>> Benjamin Neely, Ph.D. >>> Post-doc >>> Nephrology Proteomics Laboratory >>> Medical University of South Carolina >>> Charleston, SC, USA >>> > _______________________________________________ > Genome maillist - [email protected] > https://lists.soe.ucsc.edu/mailman/listinfo/genome _______________________________________________ Genome maillist - [email protected] https://lists.soe.ucsc.edu/mailman/listinfo/genome
