Hi Martin, Thanks for spending the time and effort to refactor and integrate my code into BiocCheck. I didn't intend to give you so much to think about !
For the record, I used the "checkFunctionLengths" method (now starting at line 829 of checks.R) as a template to offer code somewhat homogenous with what I could see in place already. I believe that function should also be refactored using your new "Context" and "handleContext" helpers, right? Side note: Far from trying to compete with the BiocCheck Bioconductor package, I couldn't help but update my "follow-up" "BiocCheckTools" Python code <https://github.com/kevinrue/BiocCheckTools/blob/master/README.md> in the meantime (sorry for unoriginal name, I'll take suggestions!) taking Kasper's advice on board: - Single master file which runs everything and gives a report. - *Bonus*: each module can be run separately (line_chars; tab_width). - All line numbers are reported - Content of the line is not displayed - I didn't mark up TABs, but I indicate how many spaces they are made of (when not a multiple of 4) All the best, Kevin On Wed, Nov 9, 2016 at 11:31 PM, Martin Morgan < martin.mor...@roswellpark.org> wrote: > On 11/03/2016 08:14 AM, Kevin RUE wrote: > >> Apologies for the additional spam, for two reasons: >> >> * The diff files that I've previously sent had the base and modified >> versions swapped. This new one fixes that. >> * This new diff file (always relative to the code I cloned from >> Bioconductor-mirror) also fixes a bug whereby the updated code would >> fail on packages that do not break any of the three guidelines. >> > > Hi Kevin -- thanks for these, I incorporated a version in 1.11.1. > > The report is for the first 6 lines of offenders; the code for finding all > offenders is mentioned in the vignette > > BiocCheck:::checkFormatting("path/to/YourPackage", nlines=Inf) > > Marcel Ramos mentions the lintr (CRAN) and goodpractice (github only, > https://github.com/MangoTheCat/goodpractice) packages as useful resources > for these types of questions; also formatR (CRAN). > > For the record, it was fun to work through your code. The basic pattern > was to read lines and flag those that violated a (usually simple) test, add > these to a data frame, then if after parsing all files the data frame had > any rows report these to the user. > > I standardized the columns of the data frame across checks to facilitate > re-use, and imposed the standard by using a constructor -- non-exported > Context(). I took a little bit of pain to make the constructor work (return > a data frame with the correct columns but no rows) when there are no lines > of code flagged to be added to the context, so that I could use it without > have to insert a bunch of conditional `if (...)` in my code that make it > both harder to understand and harder to test. Also, the constructor is > vectorized, whereas your implementation was iterative. > > For reporting to the user, I wrote a small helper handleContext() that I > could re-use across the different types of checks (and likely elsewhere in > the package) and also maintain the separation of what is reported to the > user (e.g., in handleContext()) from how it is reported to the user -- > .msg, .verbatim, etc in the utils.R file. Probably the reporting could have > been refactored further. > > I feel guilty about not writing unit tests; the package does include > tests, including a handy package-generator function to make packages that > are invalid in various ways. It seems like a better way to go would be to > further refactor the body of the checkFormatting() function so that one > could for instance invoke a function checkLineLengths(pkg, file, lines) and > get back a data.frame created by Context(); it would then be easy to test > these for various 'lines' without having to figure out how to make test > packages. But this seemed of course like too much additional work for this > feature (I'll probably regret not expending the effort at a future date). > > I also found myself using the horrible 'copy-and-append' paradigm > > ctxt = Context() > for (fl in files) { > ... > ctxt = rbind(ctxt, Context(...) > } > > which makes a copy of (the increasingly large) ctxt each time through the > loop and therefore scales quadratically (when each file has some problems) > with the number of files. I'm counting on the number of files to be small > (e.g., less than 10000, when the copy-and-append pattern starts to be > expensive even for trivial operations; mzR has ~8750 files, though many of > these are not checked for formatting) so that this pattern does not become > a bottleneck. It was still painful to use... > > Martin > > >> Best, >> Kevin >> >> >> On Thu, Nov 3, 2016 at 11:49 AM, Kevin RUE <kevinru...@gmail.com >> <mailto:kevinru...@gmail.com>> wrote: >> >> Hi all, >> >> Please find attached the diff relative to the code that I cloned >> from Bioconductor-mirror yesterday (please ignore the previous diff >> file). >> >> Basically three new features: >> >> * As per previous email: display up to the first 6 lines that are >> over 80 characters long >> * *New*: display up to the first 6 lines that are not indented by >> a multiple of 4 spaces >> * *New*: display up to the first 6 lines that use TAB instead of 4 >> spaces for indentation >> >> I also attach the output of the updated code >> <https://github.com/kevinrue/BiocCheck>, to illustrate the changes. >> >> Notes: >> >> * For demonstration purpose, I indented a handful of lines from >> the checks.R file itself with TAB characters. I assume that's >> OK, as some lines were already longer than 80 characters and not >> indented by a multiple of 4 spaces. >> >> All the best, >> Kevin >> >> >> On Wed, Nov 2, 2016 at 10:00 PM, Kevin RUE <kevinru...@gmail.com >> <mailto:kevinru...@gmail.com>> wrote: >> >> Me again :) >> >> Please find attached the first patch to print the first 6 lines >> over 80 characters long. (I'll get to the tabulation offenders >> next). >> >> Note that all the offending lines are stored in the "df.length" >> data.frame. How about an option like "fullReport=c(FALSE, TRUE)" >> that print *all* the offending lines? >> The data.frame also stores the content of the lines for the >> record, but does not print them. I think Kasper is right: >> filename and line should be enough to track down the line. >> >> All the best, >> Kevin >> >> >> >> On Wed, Nov 2, 2016 at 8:08 PM, Kevin RUE <kevinru...@gmail.com >> <mailto:kevinru...@gmail.com>> wrote: >> >> Thanks for the feedback! >> >> I also tend to prefer *all* the lines being reported (or to >> be honest, that was really true when I had lots of them; a >> problem that I largely mitigated by fixing all of them once >> and subsequently paying more attention while developing). >> >> Printing the content of the offending line somewhat helps me >> spot the line faster (more so for tab issues). But I must >> admit that showing the whole line is somewhat "overkill". I >> just started thinking of a compromise being to only show the >> first N characters of the line, with N being 80 minus the >> number of characters necessary to print the filename and >> line number. >> >> Thanks Martin for pointing out the lines in BiocCheck. (Now >> I feel bad for not having checked sooner.. hehe!) >> I think the idea of BiocCheck showing the first 6 offenders >> in BiocCheck quite nice, as I rarely have more since I use >> using the RStudio "Tools > Global Options > Code > Display > >> Show Margin > Margin column: 80" feature. >> >> I'll give a go at both approaches (developing BiocCheck and >> my own scripts) >> >> Cheers, >> Kevin >> >> >> On Wed, Nov 2, 2016 at 7:41 PM, Kasper Daniel Hansen >> <kasperdanielhan...@gmail.com >> <mailto:kasperdanielhan...@gmail.com>> wrote: >> >> I would prefer all line numbers reported, but on the >> other hand I am indifferent wrt. the content of the >> line, unless (say) TABs are marked up somehow. >> >> Kasper >> >> On Wed, Nov 2, 2016 at 3:17 PM, Martin Morgan >> <martin.mor...@roswellpark.org >> <mailto:martin.mor...@roswellpark.org>> wrote: >> >> On 11/02/2016 02:49 PM, Kevin RUE wrote: >> >> Dear all, >> >> Just thought I'd share a handful of scripts that >> I wrote to follow up on >> certain NOTE messages thrown by R CMD BiocCheck. >> >> https://github.com/kevinrue/BiocCheckTools >> <https://github.com/kevinrue/BiocCheckTools> >> >> They're very simple, but I occasionally find >> them quite convenient. >> Apologies if something similar already exists >> somewhere :) >> >> >> Maybe consider creating a diff against the source >> code that, e.g., reported the first 6 offenders? The >> relevant lines are near >> >> https://github.com/Bioconducto >> r-mirror/BiocCheck/blob/master/R/checks.R#L1081 >> <https://github.com/Bioconduct >> or-mirror/BiocCheck/blob/master/R/checks.R#L1081> >> >> Martin >> >> >> All the best, >> Kevin >> >> [[alternative HTML version deleted]] >> >> _______________________________________________ >> Bioc-devel@r-project.org >> <mailto:Bioc-devel@r-project.org> mailing list >> https://stat.ethz.ch/mailman/listinfo/bioc-devel >> <https://stat.ethz.ch/mailman/listinfo/bioc-devel >> > >> >> >> >> This email message may contain legally privileged >> and/or...{{dropped:2}} >> >> >> _______________________________________________ >> Bioc-devel@r-project.org >> <mailto:Bioc-devel@r-project.org> mailing list >> https://stat.ethz.ch/mailman/listinfo/bioc-devel >> <https://stat.ethz.ch/mailman/listinfo/bioc-devel> >> >> >> >> >> >> >> > > This email message may contain legally privileged and/or confidential > information. If you are not the intended recipient(s), or the employee or > agent responsible for the delivery of this message to the intended > recipient(s), you are hereby notified that any disclosure, copying, > distribution, or use of this email message is prohibited. If you have > received this message in error, please notify the sender immediately by > e-mail and delete this email message from your computer. Thank you. > [[alternative HTML version deleted]] _______________________________________________ Bioc-devel@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/bioc-devel