Re: [Bioc-sig-seq] unexpected genes names list using getBM{biomaRt}

Mohamed-Ramzi TEMANNI Tue, 08 Dec 2009 10:30:38 -0800

Hi James,
I agree with you on your comment, do you think that it will be a 
solution to fetch all the data at once ?
Regards,
Ramzi



James W. MacDonald wrote:
> Hi Ramzi,
>
> That will work in the short term, and especially if you don't have 
> many queries, but in the long term is not a reasonable workaround. 
> This is because of two things; first, you can get intermittent 
> connection errors to the Biomart server which will break your loop in 
> the middle somewhere and you will have to do the whole thing over 
> again. Second, people who have public databases do not look kindly 
> upon those who abuse their servers (and repeatedly making small 
> queries instead of a single large query is considered abuse), and have 
> been known to ban IP addresses of people who abuse their server.
>
> Best,
>
> Jim
>
>
>
> Ramzi TEMANNI wrote:
>> Hi James,
>> Thanks for your reply,
>> As i have short reads of 50b, I've modified the code accordigly to 
>> your suggestion:
>> gn.m1<-getBM(attributes= c("hgnc_symbol"),
>>        filters=c("chromosome_name","
>> start",*"end"*),
>>        
>> values=list(t.cpd[1:10,1],as.numeric(t.cpd[1:10,2]),*as.numeric(t.cpd[1:10,2])+50*),
>>  
>> mart=ensembl)
>>
>> But still having more genes than expected:
>> hgnc_symbol
>> 1      UBE2G1
>> 2      DUSP5P
>> 3   HIST3H2BA
>> 4     ZNF847P
>> 5     ACTBP11
>> 6     ZC3H11B
>> ........
>> 8488     PPP2R2C
>> 8489        WFS1
>> 8490    SNORD73A
>> 8491     SNORA24
>> 8492     SNORA26
>>
>> Did I miss something  ?
>>
>> Thanks for the remark regarding the position, you are right ! I did 
>> not notice that, have to get back to the bowtie alignment file and 
>> see what is the reason behind that. I'm using bowtie with the last 
>> hg19 ref.
>>
>> Thanks again for your comment.
>> Best Regards,
>> Ramzi
>>
>> ----------------------------------------------------------------
>>
>>
>> On Mon, Dec 7, 2009 at 4:20 PM, James W. MacDonald 
>> <[email protected] <mailto:[email protected]>> wrote:
>>
>>     Hi Ramzi,
>>
>>
>>     Ramzi TEMANNI wrote:
>>
>>         Hi,
>>         I want to extract the gene names knowing the chromosome and the
>>         position for
>>         each genes:
>>
>>             t.cpd[1:10,1:2]
>>
>>              CHR.M1 POS.M1
>>          [1,] "12"   "140059033"
>>          [2,] "19"   "164634640"
>>          [3,] "10"   "32347784"
>>          [4,] "11"   "30576841"
>>          [5,] "2"    "86479831"
>>          [6,] "12"   "237019866"
>>          [7,] "4"    "76487174"
>>          [8,] "20"   "136121868"
>>          [9,] "2"    "6255547"
>>         [10,] "1"    "67658137"
>>
>>         i use the following commands:
>>         library(biomaRt)
>>         mart = useMart("ensembl")
>>         ensembl = useDataset("hsapiens_gene_ensembl", mart = mart)
>>         gn.m1<-getBM(attributes= c("hgnc_symbol"),
>>               filters=c("chromosome_name","start"),
>>               values=list(t.cpd[1:10,1],t.cpd[1:10,2]), mart=ensembl)
>>
>>         I'm expecting having a list of 10 genes names, but instead i get
>>         8652 genes:
>>         hgnc_symbol
>>         1      OR2M1P
>>         2      OR2L1P
>>         3   HSD17B7P1
>>         4     OR14L1P
>>         5       OR2W5
>>         6       VN1R5
>>         ......
>>         8649        WFS1
>>         8650    SNORD73A
>>         8651     SNORA24
>>         8652     SNORA26
>>
>>         Did I miss something ?
>>
>>
>>     Yes. You are giving the start position, but not the end. Without
>>     explicitly telling the Biomart server where to stop looking for
>>     genes, where do you think it will stop by default?
>>
>>     Also, several of your coordinates are nonsensical. For instance,
>>     chr12 is only 133851859 bases long, chr20 is 63025520 bases long, 
>> etc.
>>
>>     Best,
>>
>>     Jim
>>
>>
>>
>>         Thanks in advance for your help
>>
>>         Best Regards,
>>         Ramzi
>>
>>         ----------------------------------------------------------------
>>
>>                [[alternative HTML version deleted]]
>>
>>         _______________________________________________
>>         Bioc-sig-sequencing mailing list
>>         [email protected]
>> <mailto:[email protected]>
>>         https://stat.ethz.ch/mailman/listinfo/bioc-sig-sequencing
>>
>>
>>     --     James W. MacDonald, M.S.
>>     Biostatistician
>>     Douglas Lab
>>     University of Michigan
>>     Department of Human Genetics
>>     5912 Buhl
>>     1241 E. Catherine St.
>>     Ann Arbor MI 48109-5618
>>     734-615-7826
>>     **********************************************************
>>     Electronic Mail is not secure, may not be read every day, and should
>>     not be used for urgent or sensitive issues 

        [[alternative HTML version deleted]]

_______________________________________________
Bioc-sig-sequencing mailing list
[email protected]
https://stat.ethz.ch/mailman/listinfo/bioc-sig-sequencing

Re: [Bioc-sig-seq] unexpected genes names list using getBM{biomaRt}

Reply via email to