Hi Li, One of our engineers suggests the following, using the kent source tree (http://genome.ucsc.edu/FAQ/FAQlicense.html#license3):
> They should try this kent source tree command operation: > > $ genePredToGtf hg19 knownGene stdout | sort -k1,1 -k4,4 | gzip -c > > hg19.knownGene.gtf.gz > > With a file in their home directory called .hg.conf with three lines: > > db.host=genome-mysql.cse.ucsc.edu > db.user=genomep > db.password=password > > It does give a much better GTF output than the table browser. > Please let us know if you have any additional questions: [email protected] - Greg Roe UCSC Genome Bioinformatics Group On 7/12/11 1:51 PM, Jia, Li (NIH/NCI) [C] wrote: > Hi Greg, > > Thanks for the response. The FAQ on GTF format doesn't answer my question. > As you suggested, if I select "All fields from selected table", the output > format is only in txt, not GTF. I really need GTF format with both > Transcript_ID and Gene_name there. I combine the two outputs from USCS > refseq table and refFlat table, it includes all information I need, it > looks like this: > > chr1 protein_coding CDS 67162933 67163102 0.000000 - 0 gene_id > "NM_207014"; transcript_id "NM_207014"; gene_name "WDR78"; > chr1 protein_coding start_codon 67163100 67163102 0.000000 - . gene_id > "NM_207014"; transcript_id "NM_207014"; gene_name "WDR78"; > chr1 protein_coding exon 67162933 67163158 0.000000 - . gene_id > "NM_207014"; transcript_id "NM_207014"; gene_name "WDR78"; > chr1 protein_coding stop_codon 58719225 58719227 0.000000 - . gene_id > "NM_145243"; transcript_id "NM_145243"; gene_name "OMA1"; > chr1 protein_coding CDS 58719228 58719434 0.000000 - 0 gene_id > "NM_145243"; transcript_id "NM_145243"; gene_name "OMA1"; > > > Unfortunately it doesn't work when I tried to use it on the analysis. > > Do you have any other suggestion? > > Thanks, > Li > > On 7/11/11 6:35 PM, "Greg Roe"<[email protected]> wrote: > >> Hi Li, >> >> Please see this section of our help describing the GTF file format: >> http://genome.ucsc.edu/FAQ/FAQformat.html#format4. >> >> If you want generate the data exactly like the table schema, for the >> output format in the Table Browser, select "All fields from selected >> table". >> >> Please let us know if you have any additional questions: >> [email protected] >> >> - >> Greg Roe >> UCSC Genome Bioinformatics Group >> >> >> >> On 7/11/11 1:13 PM, Jia, Li (NIH/NCI) [C] wrote: >>> Hi, >>> >>> I am using table browser working on generating annotation GTF format. >>> After selecting assembly of interest select: >>> >>> group: Genes and Gene Prediction Tracks >>> track: refSeq Gene >>> table: refFlat >>> output format: "GTF"--Gene transfer format >>> >>> then give the name and output the GTF file. >>> >>> My question is that my output refFlat.GTF is not exactly same as the >>> described table schema. In table schema, output format is as follows: >>> >>> geneName LOC100288778 >>> Name NR_028269 >>> chrom chr1 >>> strand - >>> txStart 4224 >>> txEnd 7502 >>> cdsStart 7502 >>> cdsEnd 7502 >>> exonCount 7 >>> exonStarts 4224,4832,5658,6469,6719,70... >>> exonEnds 4692,4901,5810,6631,6918,72... >>> >>> but my output file is: >>> chr1 hg18_refFlat exon 14601 14754 0.000000 - . >>> gene_id "WASH7P"; transcript_id "WASH7P"; >>> chr1 hg18_refFlat exon 19184 19233 0.000000 - . >>> gene_id "WASH7P"; transcript_id "WASH7P"; >>> chr1 hg18_refFlat exon 24474 25037 0.000000 - . >>> gene_id "FAM138A"; transcript_id "FAM138A"; >>> chr1 hg18_refFlat exon 25140 25344 0.000000 - . >>> gene_id "FAM138A"; transcript_id "FAM138A"; >>> >>> it has GeneName (gene_id), but there is no trancript_id (in the output, >>> it is same as gene_id). In the example schema, Name should be >>> transcript_id? >>> >>> How do I generate the table exactly like the table schema? >>> >>> Thanks, >>> Li >>> _______________________________________________ >>> Genome maillist - [email protected] >>> https://lists.soe.ucsc.edu/mailman/listinfo/genome > > _______________________________________________ > Genome maillist - [email protected] > https://lists.soe.ucsc.edu/mailman/listinfo/genome _______________________________________________ Genome maillist - [email protected] https://lists.soe.ucsc.edu/mailman/listinfo/genome
