Dear Martin, > On 17 Apr 2015, at 14:00, Martin Morgan <mtmor...@fredhutch.org> wrote: > > On 04/13/2015 02:48 AM, Thomas Maurel wrote: >> Dear Martin, >> >> I have investigated with our Web team and we believe that the command >> attempts to open a number of concurrent sessions in order to download all of >> the files. If that is the case then the problem is that our ftp server is >> configured to limit the number of concurrent sessions per user in order to >> prevent people using scripts to monopolise the server resources (and in some >> cases accidentally DoS attack the server). > > Hi Thomas -- thank you for trouble-shooting this. > > The code used getURL(url, ...) without specifying a curl= argument. This > causes a new CURLHandle to be constructed for each call to getURL(). These > are closed when the garbage collector is run, but that is apparently too > infrequent, and expensive to run explicitly. > > I updated the code to include the argument > > curl=httr::handle_find(url)$handle > > which re-uses httr's pool of url-specific handlers hence limiting the number > of simultaneous open connections. This seems to have been effective. > > Thanks again, > > Martin Thanks a lot for letting me know, I am happy to hear that you got to the bottom of this issue.
Regards, Thomas > > >> >> Hope this helps, Regards, Thomas >>> On 10 Apr 2015, at 13:40, Thomas Maurel <mau...@ebi.ac.uk> wrote: >>> >>> Hi Martin, >>> >>>> On 10 Apr 2015, at 13:23, Martin Morgan <mtmor...@fredhutch.org> wrote: >>>> >>>> On 04/10/2015 04:34 AM, Rainer Johannes wrote: >>>>> hi Martin, >>>>> >>>>> but if that's true, then I will never have a way to test whether the >>>>> recipe actually works, right? >>>> >>>> I guess I don't really know what I'm talking about, and that insert=FALSE >>>> is intended to not actually do the insertion so that the (immediate) >>>> problem is not with AnnotationHubData. >>>> >>>> From the traceback below it seems like the error occurs in calls like the >>>> following >>>> >>>> library(RCurl) >>>> getURL("ftp://ftp.ensembl.org/pub/release-78/gtf/ailuropoda_melanoleuca/ >>>> <ftp://ftp.ensembl.org/pub/release-78/gtf/ailuropoda_melanoleuca/>", >>>> dirlistonly=TRUE) >>>> >>>> This seems to sometimes work and sometimes not >>>> >>>>> urls[1] >>>> [1] "ftp://ftp.ensembl.org/pub/release-78/gtf/ailuropoda_melanoleuca/ >>>> <ftp://ftp.ensembl.org/pub/release-78/gtf/ailuropoda_melanoleuca/>" >>>>> getURL(urls[1], dirlistonly=TRUE) >>>> [1] "Ailuropoda_melanoleuca.ailMel1.78.gtf.gz\nCHECKSUMS\nREADME\n" >>>>> getURL(urls[1], dirlistonly=TRUE) >>>> [1] "Ailuropoda_melanoleuca.ailMel1.78.gtf.gz\nCHECKSUMS\nREADME\n" >>>>> getURL(urls[1], dirlistonly=TRUE) >>>> Error in function (type, msg, asError = TRUE) : Access denied: 530 >>> You are right, I�ve noticed the same thing. I will investigate and see if >>> there is something wrong with our FTP site machine. >>> >>> Regards, Thomas >>>> >>>> >>>>> >>>>> that's the full traceback: >>>>> >>>>>> updateResources(AnnotationHubRoot=getWd(), >>>>>> BiocVersion=biocVersion(), >>>>> preparerClasses="EnsemblGtfToEnsDbPreparer", insert=FALSE, >>>>> metadataOnly=TRUE) INFO [2015-04-10 13:32:18] Preparer Class: >>>>> EnsemblGtfToEnsDbPreparer Ailuropoda_melanoleuca.ailMel1.78.gtf.gz >>>>> Anas_platyrhynchos.BGI_duck_1.0.78.gtf.gz >>>>> Anolis_carolinensis.AnoCar2.0.78.gtf.gz >>>>> Astyanax_mexicanus.AstMex102.78.gtf.gz Bos_taurus.UMD3.1.78.gtf.gz >>>>> Caenorhabditis_elegans.WBcel235.78.gtf.gz >>>>> Callithrix_jacchus.C_jacchus3.2.1.78.gtf.gz Error in function (type, >>>>> msg, asError = TRUE) : Access denied: 530 >>>>>> traceback() >>>>> 17: fun(structure(list(message = msg, call = sys.call()), class = >>>>> c(typeName, "GenericCurlError", "error", "condition"))) 16: function >>>>> (type, msg, asError = TRUE) { if (!is.character(type)) { i = >>>>> match(type, CURLcodeValues) typeName = if (is.na(i)) character() else >>>>> names(CURLcodeValues)[i] } typeName = gsub("^CURLE_", "", typeName) fun >>>>> = (if (asError) stop else warning) fun(structure(list(message = msg, >>>>> call = sys.call()), class = c(typeName, "GenericCurlError", "error", >>>>> "condition"))) }(67L, "Access denied: 530", TRUE) 15: >>>>> .Call("R_curl_easy_perform", curl, .opts, isProtected, .encoding, >>>>> PACKAGE = "RCurl") 14: curlPerform(curl = curl, .opts = opts, .encoding >>>>> = .encoding) 13: getURL(url, dirlistonly = TRUE) 12: >>>>> strsplit(getURL(url, dirlistonly = TRUE), "\n") 11: (function (url, >>>>> filename, tag, verbose = TRUE) { df2 <- strsplit(getURL(url, >>>>> dirlistonly = TRUE), "\n")[[1]] df2 <- df2[grep(paste0(filename, "$"), >>>>> df2)] drop <- grepl("latest", df2) | grepl("00-", df2) df2 <- >>>>> df2[!drop] df2 <- paste0(url, df2) result <- lapply(df2, function(x) { >>>>> if (verbose) message(basename(x)) tryCatch({ h = >>>>> suppressWarnings(GET(x, config = config(nobody = TRUE, filetime = >>>>> TRUE))) nams <- names(headers(h)) if ("last-modified" %in% nams) >>>>> headers(h)[c("last-modified", "content-length")] else c(`last-modified` >>>>> = NA, `content-length` = NA) }, error = function(err) { >>>>> warning(basename(x), ": ", conditionMessage(err)) list(`last-modified` >>>>> = character(), `content-length` = character()) }) }) size <- >>>>> as.numeric(sapply(result, "[[", "content-length")) date <- >>>>> strptime(sapply(result, "[[", "last-modified"), "%a, %d %b %Y >>>>> %H:%M:%S", tz = "GMT") data.frame(fileurl = url, date, size, genome = >>>>> tag, stringsAsFactors = FALSE) })(dots[[1L]][[8L]], filename = >>>>> dots[[2L]][[1L]], tag = dots[[3L]][[8L]]) 10: mapply(FUN = f, ..., >>>>> SIMPLIFY = FALSE) 9: Map(.ftpFileInfo, urls, filename = "gtf.gz", tag = >>>>> basename(urls)) 8: do.call(rbind, Map(.ftpFileInfo, urls, filename = >>>>> "gtf.gz", tag = basename(urls))) 7: >>>>> .ensemblGtfSourceUrls(.ensemblBaseUrl, justRunUnitTest) 6: >>>>> makeAnnotationHubMetadataFunction(currentMetadata, justRunUnitTest = >>>>> justRunUnitTest, ...) 5: .generalNewResources(importPreparer, >>>>> currentMetadata, makeAnnotationHubMetadataFunction, justRunUnitTest, >>>>> ...) 4: .local(importPreparer, currentMetadata, ...) 3: >>>>> newResources(preparerInstance, listOfExistingResources, justRunUnitTest >>>>> = justRunUnitTest) 2: newResources(preparerInstance, >>>>> listOfExistingResources, justRunUnitTest = justRunUnitTest) 1: >>>>> updateResources(AnnotationHubRoot = getWd(), BiocVersion = >>>>> biocVersion(), preparerClasses = "EnsemblGtfToEnsDbPreparer", insert = >>>>> FALSE, metadataOnly = TRUE) >>>>>> >>>>> >>>>> >>>>>> On 10 Apr 2015, at 13:09, Martin Morgan <mtmor...@fredhutch.org >>>>>> <mailto:mtmor...@fredhutch.org <mailto:mtmor...@fredhutch.org>>> >>>>>> wrote: >>>>>> >>>>>> traceback() >>>>> >>>> >>>> >>>> -- Computational Biology / Fred Hutchinson Cancer Research Center 1100 >>>> Fairview Ave. N. PO Box 19024 Seattle, WA 98109 >>>> >>>> Location: Arnold Building M1 B861 Phone: (206) 667-2793 >>> >>> -- Thomas Maurel Bioinformatician - Ensembl Production Team European >>> Bioinformatics Institute (EMBL-EBI) European Molecular Biology Laboratory >>> Wellcome Trust Genome Campus Hinxton Cambridge CB10 1SD United Kingdom >>> >>> >>> [[alternative HTML version deleted]] >>> >>> _______________________________________________ Bioc-devel@r-project.org >>> mailing list https://stat.ethz.ch/mailman/listinfo/bioc-devel >> >> -- Thomas Maurel Bioinformatician - Ensembl Production Team European >> Bioinformatics Institute (EMBL-EBI) European Molecular Biology Laboratory >> Wellcome Trust Genome Campus Hinxton Cambridge CB10 1SD United Kingdom >> > > > -- > Computational Biology / Fred Hutchinson Cancer Research Center > 1100 Fairview Ave. N. > PO Box 19024 Seattle, WA 98109 > > Location: Arnold Building M1 B861 > Phone: (206) 667-2793 -- Thomas Maurel Bioinformatician - Ensembl Production Team European Bioinformatics Institute (EMBL-EBI) European Molecular Biology Laboratory Wellcome Trust Genome Campus Hinxton Cambridge CB10 1SD United Kingdom [[alternative HTML version deleted]] _______________________________________________ Bioc-devel@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/bioc-devel