Hi All,

Just wanted to share the findings with you all after looking into the problem 
more closely.

Indeed, as Syed pointed out, biomaRt directly parsing configuration XML is the 
direct cause of the problem. In the XML (retrieved from: 
http://www.ensembl.org/biomart/martservice?type=configuration&dataset=drerio_gene_ensembl),
 there is an attribute whose internalName is "agilent_g2519F", this is why 
atr$name[3] returned "agilent_g2519F". However, when you retrieve attributes 
thru biomart webservice (at: 
http://www.ensembl.org/biomart/martservice?type=attributes&dataset=drerio_gene_ensembl),
 you get an attribute with the internalName: agilent_g2519f. While firing a 
query, "agilent_g2519f" is a valid attribute but not "agilent_g2519F". So once 
biomaRt starts relying on martservice calls but not parsing XMLs, there 
shouldn't be similar problems any more.

There is one question left to be answered: where does the magic happen changing 
agilent_g2519F to agilent_g2519f? Here is the answer:

"agilent_g2519F" is used as the internalName for the attribute in MartEditor, 
hence, the XML has it; While the XML is parsed by the Perl library (which is 
also the backend of martview and martservice) at the process of running 
configure.pl, there is a bit of code converts the internalName to lower case: 
'name' => lc($xmlAttribute->{'internalName'})

One reason I wanted to present this in such a great details is that to prevent 
potential similar problems caused by mis-matching cases, at biomart software 
side we probably can enforce the rule of using only lower case for 
internalNames at the very beginning (so that we would not have case 
mis-matching within the system in the first place), ie, MartEditor only accepts 
lower case for internalNames while users configuring their marts.

Hope this helps clarifying thing a bit more.

Thanks,
Junjun

 

> -----Original Message-----
> From: [email protected] 
> [mailto:[email protected]] On Behalf Of Syed Haider
> Sent: Wednesday, April 15, 2009 11:57 AM
> To: Steffen Durinck
> Cc: Rhoda Kinsella; Damian Smedley; Ruben; [email protected]
> Subject: Re: [mart-dev] Attribute NOT FOUND
> 
> Thanks a lot Steffen for the details, hopefully things would 
> begin to fall in place in next few months.
> 
> Best,
> Syed
> 
> 
> Steffen Durinck wrote:
> > Hi All,
> > 
> > Since December we stopped parsing the XML in biomaRt and 
> get attribute 
> > and filter names through requests to the web service.  This 
> update has 
> > been available to our users as a developmental package. 
> Bioconductor 
> > has a cycle of releasing a new release every 6 months and 
> > unfortunately there has not been a new release since last 
> October.  By 
> > the end of this month however a new release of Bioconductor 
> comes out 
> > which includes the updated version of biomaRt.
> > 
> > There is thus no need for changing attribute/filter names.  
> Ask users 
> > to give the version number of biomaRt by using the R command
> > sessionInfo()
> > They'll see the following:
> > 
> >> sessionInfo()
> > R version 2.8.0 (2008-10-20)
> > x86_64-unknown-linux-gnu
> > 
> > locale:
> > 
> LC_CTYPE=en_US.UTF-8;LC_NUMERIC=C;LC_TIME=en_US.UTF-8;LC_COLLATE=en_US
> > 
> .UTF-8;LC_MONETARY=C;LC_MESSAGES=en_US.UTF-8;LC_PAPER=en_US.UTF-8;LC_N
> > 
> AME=C;LC_ADDRESS=C;LC_TELEPHONE=C;LC_MEASUREMENT=en_US.UTF-8;LC_IDENTI
> > FICATION=C
> > 
> > attached base packages:
> > [1] stats     graphics  grDevices utils     datasets  methods   base
> > 
> > other attached packages:
> > [1] biomaRt_1.99.9
> > 
> > loaded via a namespace (and not attached):
> > [1] RCurl_0.91-0 XML_1.98-1
> > 
> > 
> > if the biomaRt version is not higher than 1.99.0 they need 
> to update 
> > to using the developmental version of biomaRt which can be 
> downloaded
> > here:
> > 
> > http://bioconductor.org/packages/2.4/bioc/html/biomaRt.html
> > 
> > By the end of this month this will be the release version 
> of biomaRt 
> > and it will be installed by default.  So there will be 
> hopefully less 
> > confusion.
> > 
> > Cheers,
> > Steffen
> > 
> > On Wed, Apr 15, 2009 at 3:50 AM, Syed Haider <[email protected]> wrote:
> >> Hi All,
> >>
> >> here is the cause of the trouble. It seems 'biomaRt' is 
> parsing XMLs 
> >> directly to retrieve attribute and filter names. The equivalent of 
> >> this is to retrieve the names from appropriate web service 
> requests. 
> >> www.biomart.org or www.ensembl.org upon parsing XMLs makes all 
> >> internalNames lower case hence users interacting with the 
> web server have lower case internalNames.
> >> In case of this problematic attribute, the internalName 
> does have 'F' 
> >> at the end in XML only which gets resolved to lower case upon 
> >> configuration, however, biomaRt does not do this. I would suggest:
> >>
> >> a- short term solution is to lets try and change the name as Rhoda 
> >> suggested to lower case but a one liner 
> makeInternalNameToLowerCase() 
> >> in biomaRt would be a *bullet proof solution*.
> >>
> >> b- long term solution is to avoid direct interrogation of XMLs and 
> >> talk to web service requests to retrieve att and filt 
> names. In case 
> >> some features are missing from web service requests, 
> please do bring 
> >> this forward and we will add these to the web service response.
> >>
> >> Thanks to all for your patience with this,
> >>
> >> Best,
> >> Syed
> >>
> >>
> >> Rhoda Kinsella wrote:
> >>> Hi Syed,
> >>>> Thats exactly my point, what i am  unable to understand is that 
> >>>> where the 'F' comes from in the first place. Its not 
> present in ensembl.
> >>> It is present in the Ensembl configuration files as an internal 
> >>> name, which is where biomaRt gets the information from. 
> They don't 
> >>> have agilent_g2519F hardcoded anywhere in biomaRt. It is 
> pulled out 
> >>> of our meta tables as far as I understand.
> >>> Hope that clarifies things a bit better, Regards, Rhoda
> >>>
> >>>
> >>>> If its a typo during hard-coded procedure while naming 
> the atts in 
> >>>> biomaRt, then ensembl should not really be changing anything.
> >>>>
> >>>>
> >>>>
> >>>> Rhoda Kinsella wrote:
> >>>>> Hi Syed,
> >>>>> If you do listAtributes(ensembl) in biomaRt you will see the 'F'
> >>>>> (attribute number 3):
> >>>>>> listAttributes(ensembl)
> >>>>>                                          name
> >>>>>    description
> >>>>> 1                                affy_zebrafish
> >>>>>   Affy zebrafish
> >>>>> 2                                agilent_g2518a
> >>>>>   Agilent g2518a
> >>>>> 3                                agilent_g2519F
> >>>>>   Agilent g2519f
> >>>>> 4                                 agilent_probe
> >>>>>    Agilent Probe
> >>>>> etc...
> >>>>> So the capital F is in the internal name and the 
> lowercase 'f' in 
> >>>>> the display name which we see on the mart interface. If 
> I do the 
> >>>>> same query using agilent_g2518a (which has lowercase a 
> in filter 
> >>>>> and attribute) I can pull out results. I agree it is 
> not ideal to 
> >>>>> change these internal names, but in this case I think 
> it will be necessary to fix the problem.
> >>>>> Regards,
> >>>>> Rhoda
> >>>>> On 15 Apr 2009, at 10:19, Syed Haider wrote:
> >>>>>> Which ever is the case, the name even if with small 
> 'f' still is 
> >>>>>> not the same as with_*. What i see right now is that 
> both att and 
> >>>>>> filt have small 'f'. I wonder where is the capital 'F' 
> is coming from ?
> >>>>>>
> >>>>>> wwe also need to be careful in changing names since 
> other clients 
> >>>>>> also have saved queries which might break.
> >>>>>>
> >>>>>> Syed
> >>>>>>
> >>>>>>
> >>>>>> Damian Smedley wrote:
> >>>>>>> On Wed, Apr 15, 2009 at 9:31 AM, Syed Haider <[email protected] 
> >>>>>>> <mailto:[email protected]>> wrote:
> >>>>>>>  Hi Ruben,
> >>>>>>>  Have you tried executing your query getBM(...) with  
> >>>>>>> 'agilent_gf2519f' ? biomaRt throws exception in both case  
> >>>>>>> (agilent_gf2519f and agilent_gf2519F). The real attribute is  
> >>>>>>> 'agilent_gf2519f' which works fine from www.biomart.org  
> >>>>>>> <http://www.biomart.org>. I am not sure how the new 
> release is 
> >>>>>>> going  to fix this bug, may be i am missing something 
> here. By 
> >>>>>>> trying it on  R, i feel that its a problem with R API of 
> >>>>>>> biomaRt, cc'ing Steffen  who would know how to debug this.
> >>>>>>> Its a few years since I was involved with this but I have a 
> >>>>>>> feeling BioConductor hard codes the names of attributes and 
> >>>>>>> filters you can use and if these don't match the 
> names in the EnsemblMart config then things break.
> >>>>>>> At least that used to be the case. So it seems either 
> Rhoda has 
> >>>>>>> to change the name back in the next ensembl release 
> or the hard 
> >>>>>>> coded name in BioConductor will need changing. 
> Steffen will know 
> >>>>>>> more Cheers Damian
> >>>>>>>   Best,
> >>>>>>>  Syed
> >>>>>>>  Ruben wrote:
> >>>>>>>      Hi Syed,
> >>>>>>>      in biomaRt from bioconductor the attribute name is
> >>>>>>>      'agilent_g2519F'. If I execute this code
> >>>>>>>       > ensembl = useMart("ensembl");
> >>>>>>>       > ensembl = useDataset("drerio_gene_ensembl", 
> mart=ensembl);
> >>>>>>>       > atr <- listAttributes(ensembl)
> >>>>>>>       > atr$name[3]
> >>>>>>>      I get this:
> >>>>>>>       > [1] "agilent_g2519F"
> >>>>>>>      It seems that there is a bug, but the new 
> release will fix it.
> >>>>>>>      Thanks again,
> >>>>>>>      Rubén.
> >>>>>>>      Syed Haider wrote:
> >>>>>>>          Ruben,
> >>>>>>>          the attribute name is: 'agilent_g2519f' not 
> 'agilent_g2519F'
> >>>>>>>          hope this works.
> >>>>>>>          Best,
> >>>>>>>          Syed
> >>>>>>>          Rhoda Kinsella wrote:
> >>>>>>>              Hi Ruben,
> >>>>>>>              I suspect that there is inconsistency between the
> >>>>>>>              spelling of the internal name of the 
> filter and the
> >>>>>>>              attribute. I will look into it and try 
> to fix it for
> >>>>>>>              release 54 (approx end of April). Many 
> apologies for any
> >>>>>>>              inconvenience caused.
> >>>>>>>              Regards,
> >>>>>>>              Rhoda
> >>>>>>>              On 14 Apr 2009, at 12:27, Ruben wrote:
> >>>>>>>                  Hi to all,
> >>>>>>>                  I am trying to invoke the following R code,
> >>>>>>>                  ensembl = useMart("ensembl");
> >>>>>>>                  ensembl = useDataset("drerio_gene_ensembl",
> >>>>>>>                  mart=ensembl);
> >>>>>>>                  ids <- getBM(attributes =
> >>>>>>>                  
> c("ensembl_gene_id","agilent_g2519F"), filters =
> >>>>>>>                  "with_agilent_g2519f", values 
> =TRUE,mart=ensembl);
> >>>>>>>                  but the result is always the same:
> >>>>>>>                  1 Query ERROR: caught 
> BioMart::Exception::Usage:
> >>>>>>>                  Attribute agilent_g2519F NOT FOUND
> >>>>>>>                  Error en getBM(attributes = 
> c("ensembl_gene_id",
> >>>>>>>                  "agilent_g2519F"), filters =
> >>>>>>>                  c("with_agilent_g2519f"),  :
> >>>>>>>                  Number of columns in the query 
> result doesn't equal
> >>>>>>>                  number of attributes in query.  This 
> is probably an
> >>>>>>>                  internal error, please report.
> >>>>>>>                  However, if I retrieve the list of available
> >>>>>>>                  attributes I can see 
> "agilent_g2519F" in the list.
> >>>>>>>                  Can you help me? There is a mistake 
> in my code or
> >>>>>>>                  there is something wrong in biomart?
> >>>>>>>                  Thanks in advance,
> >>>>>>>                  Rubén.
> >>>>>>>              Rhoda Kinsella Ph.D.
> >>>>>>>              Ensembl Bioinformatician,
> >>>>>>>              European Bioinformatics Institute (EMBL-EBI),
> >>>>>>>              Wellcome Trust Genome Campus,
> >>>>>>>              Hinxton
> >>>>>>>              Cambridge CB10 1SD,
> >>>>>>>              UK.
> >>>>> Rhoda Kinsella Ph.D.
> >>>>> Ensembl Bioinformatician,
> >>>>> European Bioinformatics Institute (EMBL-EBI), Wellcome Trust 
> >>>>> Genome Campus, Hinxton Cambridge CB10 1SD, UK.
> >>> Rhoda Kinsella Ph.D.
> >>> Ensembl Bioinformatician,
> >>> European Bioinformatics Institute (EMBL-EBI), Wellcome 
> Trust Genome 
> >>> Campus, Hinxton Cambridge CB10 1SD, UK.
> 

Reply via email to