Re: [Bioc-devel] R6 class v.s. S4 class

Chunlei Wu Thu, 19 Oct 2017 21:24:08 -0700

Thank you all for the feedback. Just to give some extra context, here we have 
the Python and Javascript versions of the biothings_client:



https://github.com/biothings/biothings_client.py


https://github.com/biothings/biothings_client.js


And here is the work-in-progress R client:


https://github.com/biothings/biothings_client.R



You can find some examples from the README and the test code to see how the 
client works in Python and Javascript.


One of the nice features of both Python and JS clients is it allows users to 
use the same client instance for any new "BioThings" API in the future, which 
can be created by another user, not just from us. In this case, one can do this 
to work with a new API in python client:


from biothings_client import get_client

mything_client = get_client("mything", url="http://example.com/v1/api";)   # 
could have some extra parameters

mything_client.query(...)

mything_client.get_mything(...)

...


As the developer of all these three biothings_clients, we, of course, like to 
keep the same pattern for R, and R6 looks the closest to me. But it looks like, 
from R users' perspective, this is not a popular pattern to use.  With your 
suggestion, I think it can work this way in R:


library(biothings)

gene_client = BioThingsClient('gene')     # a gene client with a preset config

queryBioThings(gene_client, "CDK2")    # whether we should keep client as the 
first argv, that's still TBD, based on the previous pipe comment


mything_client = BioThingsClient('mything', url= "http://example.com/v1/api";)

queryBioThings(mything_client, "something')



Another thing I should mention, in Python client, each client has these methods:


gene_client.getgene

gene_client.getgenes

gene_client.query

gene_client.querymany

gene_client.metdata


Then in R, we will have to create these generic methods (hope this is the right 
term):


getBioThing(mything_client, ...)

getBioThings

queryBioThings

queryManyBioThings

BioThingsMetadata


I personally still like the Python/JS pattern, as you can have client specific 
name like "getgene", "getgenes", instead of the generic getBioThing and 
getBioThings name. Plus that users can just call "gene_client" part as "gc" or 
whatever, it just has much less to type :-) in the code. In R S4 case, the 
function name has to be more verbose because they are global.


Does this sound good to the group? Any more suggestions?


Chunlei













________________________________
From: Michael Lawrence <lawrence.mich...@gene.com>
Sent: Thursday, October 19, 2017 8:32 PM
To: Martin Morgan
Cc: Charles Plessy; bioc-devel@r-project.org; Chunlei Wu
Subject: Re: [Bioc-devel] R6 class v.s. S4 class

API discoverability is a big problem in languages with a functional syntax. 
Namespaces are verbose, but they do provide for constrained autocompletion. 
Prefixing all symbols with an abbreviation like "bt_" seems too adhoc to me, 
but it is common practice. Explicitly querying for methods takes the user out 
of the flow.

One could imagine an IDE showing available methods in the tooltip of function 
symbols.

I guess an IDE could support autocompeting on  "(object)" or "(object,", where 
<tab> would display generics with applicable methods and fill in the name in 
front of the "(". Not very intuitive though.

By simplifying our APIs we make discoverability less of an issue, because they 
are easily listed on cheat sheets and memorized.

I wonder if there are ideas to steal from Julia.

On Thu, Oct 19, 2017 at 7:36 PM, Martin Morgan 
<martin.mor...@roswellpark.org<mailto:martin.mor...@roswellpark.org>> wrote:
On 10/19/2017 09:24 PM, Charles Plessy wrote:
(Just sharing my thoughts as those days I am spending quite
some time preparing the upgrade of a Bioconductor package).

Le Fri, Oct 20, 2017 at 12:50:48AM +0000, Ryan Thompson a �crit :

gene_client <- BioThingsClient("gene")
query("CDK2", client=gene_client)

In addition, since the piping operator (%>%) of dplyr and magrittr is
gaining traction, I would recommend to carefully consider which will be
the first argument of the function:

With the client as first argument, one can then write things like:

     gene_client %>% query("CDK2")  # similar to query(gene_client, "CDK2")

The Bioconductor convention would use S4 objects with CamelCase constructors.

  geneClient = BioThingsGeneClient()  ## or just GeneClient()

I agree with enabling the use of pipe, and think the generic + methods should 
have signature where the first argument is the client rather than the pattern 
against which the query occurs. There is to some extent an argument for 
name-mangling in the generic (other knowledgeable people disagree) so that one 
is free to implement contracts unique to the package in question, and avoid 
conflicts with other generics with identical names in different packages ( 
AnnotationDbi::select() / dplyr::select()).

  setGeneric(
    "btQuery",
    function(x, query, ...) standardGeneric("btQuery")
  )

  setMethod(
    "btQuery", "GeneClient",
    function(x, query)
  {
    ## implementation
  })

  btQuery(geneClient, "CDK2")  ## maybe btquery(...)

Yes one could BioThings::query(), or 
semanticallyInformativeAlterntaiveToQuery(), but these seem cumbersome to me, 
and the first at least has rough edges (that of course should be fixed...), 
e.g.,

  > methods(AnnotationHub::query)
  Error in .S3methods(generic.function, class, parent.frame()) :
    no function 'AnnotationHub::query' is visible

I think Michael is arguing for something like plain-old-functions (and the 
original examples and problems of multiplying methods seemed somehow to be 
plain old functions rather than S4 generics and methods?)

  geneQuery <- function(x, query) ...

A down side is that one cannot discover programatically what one can do with a 
GeneClient object (if it were a method, one could ask for 
methods(class=class(geneClient))); as a developer one also needs to validate 
the incoming argument, which requires a certain but not unsurmountable 
discipline.

Michael didn't mention it, but these slides of his are relevant


https://bioconductor.org/help/course-materials/2017/BioC2017/DDay/BOF/usability.pdf

One other lesson from the annotation world is to think carefully about the 
structure of the return, in particular thinking about 1:1 versus 1:many 
mappings between vector-valued 'pattern='. While it's tempting to return say a 
character vector or named list, probably one wants these days to take the 
lessons of tidy data and return a data.frame-like (e.g., DataFrame(), but maybe 
that's not 'necessary'; nothing wrong with a tibble, but a data.table is not 
likely necessary or particularly advised [because of the novel syntax and 
reference semantics]) object where the first column is the query and the second 
and subsequent columns the result of the query; one wants to pay particular 
attention to dealing with 1:0 and 1:many mappings in ways that do not confuse 
users; some use cases (e.g., adding annotations to the rowData() of 
SummarizedExperiment) are really facilitated by a 1:1 mapping between query and 
response.

Martin


With the gene symbol as first argument:

     "CDK2" %>% query(gene_client)  # similar to query("CDK2", gene_client)

If gene symbols may come as output from other commands and the query
function is able to work smartly with a vector of gene symbols as input,
then the second pattern might be useful.  Otherwise the first pattern
probably makes more sense.

See https://cran.r-project.org/web/packages/magrittr/vignettes/magrittr.html 
for details.

(Note however that the piped and non-piped functions are not exactly
equivalent, and that piped commands can be harder to debug; therefore
it may be better to only use them in interactive sessions.)

Have a nice day,



This email message may contain legally privileged and/or...{{dropped:2}}


_______________________________________________
Bioc-devel@r-project.org<mailto:Bioc-devel@r-project.org> mailing list
https://stat.ethz.ch/mailman/listinfo/bioc-devel


        [[alternative HTML version deleted]]

_______________________________________________
Bioc-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/bioc-devel

Re: [Bioc-devel] R6 class v.s. S4 class

Reply via email to