On 04/08/2017 12:29 PM, Aaron Lun wrote:
Martin Morgan wrote:On 04/08/2017 08:16 AM, Aaron Lun wrote:On 07/04/17 20:46, luke-tier...@uiowa.edu wrote:On Fri, 7 Apr 2017, Hervé Pagès wrote:On 04/07/2017 05:37 AM, luke-tier...@uiowa.edu wrote:On Fri, 7 Apr 2017, Hervé Pagès wrote:On 04/06/2017 03:29 AM, Michael Lawrence wrote:On Thu, Apr 6, 2017 at 2:59 AM, Martin Morgan <martin.mor...@roswellpark.org> wrote:On 04/06/2017 05:33 AM, Aaron Lun wrote:The tool is not perfect, so assess each reportcarefully.I get a lot of warnings because the tool seems to consider thatextracting an attribute (with getAttrib(x, ...)) or extracting the slot of an S4 object (with GET_SLOT(x, ...) or R_do_slot(x, ...)) returns an SEXP that needs protection. I always assumed that it didn't because my understanding is that the returned SEXP is pointing to a part of a pre-existing object ('x') and not to a newly created one. So I decided I could treat it like the SEXP returned by VECTOR_ELT(), which, AFAIK, doesn't need protection.So I suspect these warnings are false positives but I'm not 100%sure. If you are not 100% sure then you should protect :-) There are some cases, in particular related to compact row names on data frames, where getAttrib will allocate.Seriously? So setAttrib(x, ..., getAttrib) is not going to be a no-op anymore? Should I worry that VECTOR_ELT() will also expand some sort of compact list element? Why not keep these things low-level getters/setters that return whatever the real thing is and use higher-level accessors for returning the expanded version of the thing?Seriously: it's been that way since r37807 in 2006. If you want to read about some related future directions you can look at https://svn.r-project.org/R/branches/ALTREP/ALTREP.html. lukeI was curious about this so I checked out what R-exts had to say involving set/getAttrib. Funnily enough, the example it gives in Section 5.9.4 seems to be incorrect in its UNPROTECTing. #include <R.h> #include <Rinternals.h> SEXP out(SEXP x, SEXP y) { int nx = length(x), ny = length(y); SEXP ans = PROTECT(allocMatrix(REALSXP, nx, ny)); double *rx = REAL(x), *ry = REAL(y), *rans = REAL(ans); for(int i = 0; i < nx; i++) { double tmp = rx[i]; for(int j = 0; j < ny; j++) rans[i + nx*j] = tmp * ry[j]; } SEXP dimnames = PROTECT(allocVector(VECSXP, 2)); SET_VECTOR_ELT(dimnames, 0, getAttrib(x, R_NamesSymbol)); SET_VECTOR_ELT(dimnames, 1, getAttrib(y, R_NamesSymbol)); setAttrib(ans, R_DimNamesSymbol, dimnames); UNPROTECT(3); return ans; } There are two PROTECT calls but an UNPROTECT(3), which triggers a stack imbalance warning upon trying to run .Call("out", ...) in R.Yes, that should be UNPROTECT(2). svn blame says the error was introduced when allocMatrix() was introduced; prior to that the code had allocVector(), then set dim and dimnames. As for whether to PROTECT or not, my analysis would be... SET_VECTOR_ELT does not (currently) allocate (except on error) so there is no opportunity for the garbage collector to run, hence no need to PROTECT. Further, getAttrib() (currently) allocates only if (1) the attribute is R_RowNamesSymbol and the row names are stored in compact format c(NA_integer_, nrow); or (2) the first argument is a classic pairlist or language SEXP. None of these conditions apply, so the return value of getAttrib() is PROTECTed anyway. Luke's analysis would be more straight-forward: if in doubt, PROTECT. I think Herve, Gabe, and perhaps Michael would take Luke's advice, and maybe also note that my advice, in addition to being an analysis of some surprisingly complicated code by a practitioner of dubious credibility, involves the current state of affairs, and you never know (and apparently ALTREP makes it less certain) what the future will hold. So they'd probably say PROTECT. One might be tempted to write SET_VECTOR_ELT(dimnames, 0, PROTECT(getAttrib(x, R_NamesSymbol))); but I'm not sure whether C guarantees that function arguments are fully evaluated, maybe it's legal for a compiler to evaluate getAttrib(), then do some more work with other arguments, then evaluate PROTECT(), so long as the overall logic is not disrupted. So the 'if in doubt' argument would make me write SEXP nms = PROTECT(getAttrib(x, R_NamesSymbol)); SET_VECTOR_ELT(dimnames, 0, nms); I think , in C is called a 'sequence point'. Google takes me to https://msdn.microsoft.com/en-us/library/azk8zbxd.aspx where it seems like the left operand of ',' are fully evaluated before proceeding, and furthermore that arguments to functions are evaluated before entering the function, implying that it is safe to use the one-liner SET_VECTOR_ELT(dimnames, 0, PROTECT(getAttrib(x, R_NamesSymbol))); At any rate I changed the example in R-exts to UNPROTECT(2), leaving the nuances for further discussion. MartinI wonder if the following is a sensible idea: int Rf_num_protected; // global variable void Rf_start_protection() { Rf_num_protected=0; return; } SEXP Rf_add_protection(SEXP x) { ++Rf_num_protected; return PROTECT(x); } void Rf_end_protection() { UNPROTECT(Rf_num_protected); return; } The idea would be to: 1. call Rf_start_protection() at the top of the native routine 2. replace all uses of PROTECT with Rf_add_protection 3. call Rf_end_protection() just before returning to R This would avoid having to keep track of the number of PROTECTs performed, which may not be trivial if the routine can return at multiple points. It might also useful for C++ native routine creating class instances that need to do internal PROTECTs for the lifetime of the instance. As long as those PROTECTs are done via Rf_add_protection(), a single Rf_end_protection() call at the bottom of the top-level routine would be sufficient to handle them all. In contrast, putting a matching UNPROTECT in the class destructor is not safe, as it is possible to trigger the destructor to UNPROTECT an unrelated SEXP: SEXP blah(SEXP x) { my_class* ptr=new my_class(x); // say this does an internal PROTECT SEXP output=PROTECT(allocVector(INTSXP, 1)); // ... do something with output here... delete ptr; // if UNPROTECT is in the destructor, it UNPROTECTs output instead // ... do some more stuff, possibly involving allocations ... UNPROTECT(1); // this actually UNPROTECTs whatever was in my_class return output; }
Global variables are problematic to reason about, e.g., in nested calls or parallel code sections.
'ad hoc' (no offense intended) solutions often increase rather than reduce cognitive burden, because someone new to the code (including one's future self) has to parse the intention and validate use.
Rcpp seems like the right approach for C++ code; it largely removes the need for explicit PROTECTion management, and is widely used and responsibly maintained so the edge cases / tricky problems get discovered and addressed.
Martin
Anyway, getting back to the topic of this thread; if we were to pretend that getAttrib() allocates in the above example, would that mean that both getAttrib() calls now need to be PROTECTed by the developer? Or is this handled somewhere internally?Thanks, H.Best, lukeI also get a warning on almost every C++function I've written,because I use the following code to handle exceptions:SEXP output=PROTECT(allocVector(...));try { // do something that might raise an exception } catch (std::exception& e) { UNPROTECT(1); throw; // break out of this part of the function } UNPROTECT(1); return output;Presumably the check doesn't account for transfer ofcontrol to > > > > thecatch block. I find that R itself is pretty good atcomplaining > > > > aboutstack imbalances during execution of tests, examples, etc.'My' packages(Rsamtools, DirichletMultinomial) had several falsepositives > > > > > (allassociated with use of an attribute of a protected SEXP),one > > > > > subtleproblem (a symbol from a PROTECT'ed package name space;the > > > > > symbolcould in theory be an active binding and the value obtained not PROTECTed by the name space), and a genuine bugtag = NEW_CHARACTER(n);for (int j = 0; j < n; ++j) SET_STRING_ELT(tag, j, NA_STRING); if ('A' == aux[0]) { buf_A = R_alloc(2, sizeof(char)); #<<- bugbuf_A[1] = '\0'; } ... SET_VECTOR_ELT(tags, i, tag); # PROTECTtag, toolate!I assume the bug refers to the un-PROTECT'dNEW_CHARACTER here - > > > > theR_alloc call looks okay to me...yes, tag needs protection.I attributed the bug to R_alloc because I failed toreason that > > > R_alloc(obviously) allocates and hence can trigger a garbage collection.Somehow it reflects my approach to PROTECTion, probablynot shared > > > byeveryone. I like to PROTECT only when necessary, rather than indiscriminately. Probably this has no practical consequence in terms of performance, making the code a little easier to read at theexpense > > > ofexposing me to bugs like this.I guess it's a tradeoff between syntactic complexityand logicalcomplexity. You have to think pretty hard to minimize use of the protect stack. I prefer to call it logical obscurity ;-) The hard thinking consists in assessing whether or not the codebetweenthe line where a new SEXP is allocated (line X) and the line where it's put in a safe place (line Y) can trigger garbage collection. Hard to figure out in general indeed, but not impossible! Problem is that the result of this assessment is valid at a certain point in time but might change in the future, even if your code has not changed.So a dangerous game for virtually zero benefits.One recommendation might be to UNPROTECT() as soon as thepointer onthe top is unneeded, rather than trying to figure out the number to pop just before returning to R. If you PROTECT() in a loop, you definitely want to do that.Otherwise,does it make a big difference?One thing that got me is that the order in which Cevaluates functioncall arguments is undefined. I did a lot of R_setAttrib(x, install("foo"), allocBar()), thinking that the symbol would be automatically protected, and allocBar() would not need protection, since it happened last. Unfortunately, it might be evaluated first. I got hit by this too long time ago but with defineVar() instead ofR_setAttrib():https://urldefense.proofpoint.com/v2/url?u=https-3A__stat.ethz.ch_pipermail_r-2Ddevel_2008-2DJanuary_048040.html&d=DwID-g&c=eRAMFD45gAfqt84VtBcfhQ&r=BK7q3XeAvimeWdGbWY_wJYbW0WYiZvSXAJJKaaPhzWA&m=FscW1HcPCwUqtMwKVFDfd1NyW0oHh0tJOPdFb3C1IWk&s=O3CcB-Z_OkVKaC1aV0aIc5SCDNqGQrkvGSmPf0TRAsw&e=H.Btw, I think my package RGtk2 got the record: 1952 errors.Luckilyalmost all of them happened inside a few macros and autogenerated code.MartinCheers,Aaron_______________________________________________ Bioc-devel@r-project.org mailing listhttps://urldefense.proofpoint.com/v2/url?u=https-3A__stat.ethz.ch_mailman_listinfo_bioc-2Ddevel&d=DwICAg&c=eRAMFD45gAfqt84VtBcfhQ&r=BK7q3XeAvimeWdGbWY_wJYbW0WYiZvSXAJJKaaPhzWA&m=su20YzStywMa_pdWblzF0RnK8ATRw-t61lOIOsi0xTU&s=JhBOw1ac5wXfV1BSjFuidxFiBTx43J7iEvZG4G0_0uU&e=This email message may containlegally privilegedand/or...{{dropped:2}}_______________________________________________Bioc-devel@r-project.org mailing listhttps://urldefense.proofpoint.com/v2/url?u=https-3A__stat.ethz.ch_mailman_listinfo_bioc-2Ddevel&d=DwICAg&c=eRAMFD45gAfqt84VtBcfhQ&r=BK7q3XeAvimeWdGbWY_wJYbW0WYiZvSXAJJKaaPhzWA&m=su20YzStywMa_pdWblzF0RnK8ATRw-t61lOIOsi0xTU&s=JhBOw1ac5wXfV1BSjFuidxFiBTx43J7iEvZG4G0_0uU&e=_______________________________________________Bioc-devel@r-project.org mailing listhttps://urldefense.proofpoint.com/v2/url?u=https-3A__stat.ethz.ch_mailman_listinfo_bioc-2Ddevel&d=DwICAg&c=eRAMFD45gAfqt84VtBcfhQ&r=BK7q3XeAvimeWdGbWY_wJYbW0WYiZvSXAJJKaaPhzWA&m=su20YzStywMa_pdWblzF0RnK8ATRw-t61lOIOsi0xTU&s=JhBOw1ac5wXfV1BSjFuidxFiBTx43J7iEvZG4G0_0uU&e=_______________________________________________ Bioc-devel@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/bioc-develThis email message may contain legally privileged and/or confidential information. If you are not the intended recipient(s), or the employee or agent responsible for the delivery of this message to the intended recipient(s), you are hereby notified that any disclosure, copying, distribution, or use of this email message is prohibited. If you have received this message in error, please notify the sender immediately by e-mail and delete this email message from your computer. Thank you._______________________________________________ Bioc-devel@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/bioc-devel
This email message may contain legally privileged and/or...{{dropped:2}} _______________________________________________ Bioc-devel@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/bioc-devel