Thanks Martin, this has been really helpful, I've reported your observations to our sysadmins and they fixed it by modifying the Apache config file in our server, replacing the line:
AddType application/x-gzip .gz .tgz by AddType application/x-gzip .gz .tgz .bam and now it works, just in case somebody else experiences this problem in the future. as far as I know, indeed BAM files use a block-compression format compatible with GZIP, so I guess curl needs to know about this, and I guess the Microsoft Azure Data Lake Cloud is already configured to serve that information, reason why the BAM files there were downloading fine. robert. On 3/30/23 03:09, Martin Morgan wrote: > > Some more not-necessarily helpful observations. You can get verbose > output with > > curl::curl_fetch_disk(url, tempfile(), handle = new_handle(verbose = > TRUE)) > > and on the command line with curl -v -L … > > Also, it seems that other BAM files can be downloaded, e.g., from > eh[["EH3502"]] (also httr::with_verbose(eh[["EH3502"]])). Would be > worth while verifying this a little more completely; I looked for > > mcols(eh)|> as_tibble(rownames="ehid") |> filter(sourcetype == "BAM", > rdataclass == "BamFile") > > If it’s true that other BAM files are ok, then it points to the way > the files are being served on ‘your’ end. > > One difference I see is that ‘your’ files have Content-Encoding: gzip, > but there is no Content-Encoding tag on the BAM file above. I guess > BAM files are (some flavor of) gzip (?), but maybe this is confusing > the R curl library… > > Martin > > *From: *Robert Castelo <robert.cast...@upf.edu> > *Date: *Wednesday, March 29, 2023 at 4:08 PM > *To: *Martin Morgan <mtmorgan.b...@gmail.com>, > bioc-devel@r-project.org <bioc-devel@r-project.org> > *Subject: *Re: [Bioc-devel] httr::GET() problem downloading a > ExperimentHub resource > > good catch, but really enigmatic, BAI files work, but BAM don't: > > dat <- > read.csv("https://raw.githubusercontent.com/functionalgenomics/gDNAinRNAseqData/devel/inst/extdata/metadata_LiYu22subsetBAMfiles.csv" > > <https://raw.githubusercontent.com/functionalgenomics/gDNAinRNAseqData/devel/inst/extdata/metadata_LiYu22subsetBAMfiles.csv>) > rdatapath <- strsplit(dat$RDataPath, ":") > bamfiles <- unlist(rdatapath)[seq(1, 18, 2)] > baifiles <- unlist(rdatapath)[seq(2, 18, 2)] > > bamurls <- paste0(dat$Location_Prefix, bamfiles) > baiurls <- paste0(dat$Location_Prefix, baifiles) > > ## BAM files give error > for (bf in bamurls) { > cat(sprintf("%s\n", basename(bf))) > tryCatch({ > curl::curl_fetch_disk(bf, tempfile()) > }, error=function(e) message(paste0(e, "\n"))) > } > > ## BAI files do not give error > for (bf in baiurls) { > cat(sprintf("%s\n", basename(bf))) > tryCatch({ > curl::curl_fetch_disk(bf, tempfile()) > }, error=function(e) message(paste0(e, "\n"))) > } > > any further idea?? > > robert. > > On 29/3/23 21:10, Martin Morgan wrote: > > Not really helpful but this could be simplified a bit by removing > the redirect from experiment hub, and the layer from httr to curl, so > > url = > > "https://functionalgenomics.upf.edu/experimenthub/gdnainrnaseqdata/LiYu22subsetBAMfiles/s32gDNA0.bam" > > curl::curl_fetch_disk(url, tempfile()) > > Error in > > curl::curl_fetch_disk("https://functionalgenomics.upf.edu/experimenthub/gdnainrnaseqdata/LiYu22subsetBAMfiles/s32gDNA0.bam" > > <https://functionalgenomics.upf.edu/experimenthub/gdnainrnaseqdata/LiYu22subsetBAMfiles/s32gDNA0.bam>, > : > > Failed writing received data to disk/application > > I notice the index file (extension .bai) works; do other BAM files > work, too? > > Martin > > *From: *Bioc-devel <bioc-devel-boun...@r-project.org> > <mailto:bioc-devel-boun...@r-project.org> on behalf of Robert > Castelo <robert.cast...@upf.edu> <mailto:robert.cast...@upf.edu> > *Date: *Wednesday, March 29, 2023 at 1:18 PM > *To: *bioc-devel@r-project.org <bioc-devel@r-project.org> > <mailto:bioc-devel@r-project.org> > *Subject: *[Bioc-devel] httr::GET() problem downloading a > ExperimentHub resource > > hi, > > we recently added a few new ExperimentHub resources, consisting of > BAM > files and their corresponding BAI files and hosted in my own server. > while it seems that they are accessible, they cannot be downloaded > through the ExperimentHub API. the minimum example reproducing the > problem is this one (using Bioc devel): > > library(ExperimentHub) > httr::GET("https://experimenthub.bioconductor.org/fetch/8129") > Error in curl::curl_fetch_memory(url, handle = handle) : > Failed writing received data to disk/application > > while there's apparently no problem to "manually" download the > resource > using 'download.file()' and loading it with > 'GenomicAlignments::readGAlignments()': > > download.file("https://experimenthub.bioconductor.org/fetch/8129", > "file.bam") > trying URL 'https://experimenthub.bioconductor.org/fetch/8129' > Content type 'application/octet-stream' length 13296358 bytes > (12.7 MB) > ================================================== > downloaded 12.7 MB > > gal <- GenomicAlignments::readGAlignments("file.bam") > gal[1:3] > GAlignments object with 3 alignments and 0 metadata columns: > seqnames strand cigar qwidth start end width > <Rle> <Rle> <character> <integer> <integer> <integer> > <integer> > [1] chr1 + 49M1S 50 16208 16256 49 > [2] chr1 + 3S47M 50 16976 17022 47 > [3] chr1 - 10M177N40M 50 17046 17272 227 > njunc > <integer> > [1] 0 > [2] 0 > [3] 1 > ------- > seqinfo: 2580 sequences from an unspecified genome > > any hint why 'httr::GET()' fails, while 'download.file()' doesn't? > > thanks!! > > robert. > ps: just to clarify, the 'httr::GET()' example is behind the > following > problem: > > eh <- ExperimentHub() > z <- eh[["EH8079"]] > see ?gDNAinRNAseqData and browseVignettes('gDNAinRNAseqData') for > documentation > downloading 2 resources > retrieving 2 resources > |======================================================================| > > 100% > > Error: failed to load resource > name: EH8079 > title: RNA-seq data BAM file subset of HRR589632 contaminated > with 0% > gDNA > reason: 1 resources failed to download > In addition: Warning messages: > 1: download failed > web resource path: > ‘https://experimenthub.bioconductor.org/fetch/8129’ > <https://experimenthub.bioconductor.org/fetch/8129’> > > <https://secure-web.cisco.com/1G9U1udOgqvil7BzSrk1HB2QvPNNeRPXidZLvh_epNXLPv1TrhUqn08C9P35HGdtTOb7o618WNCTyiVyN33-XUDlHCBdrEge6kXsqOKgSLtQvTHIAy-lStrk-VCkYpHvBPBmBnsfje9oWlLBS3j_GHaZhn97VjWPhVuy-Dmaf2COELmWHmMNGFKsbPFgrf9c1uASwhF8epk0meG_S_IDryWy2EhVlyNGlVjBrkp6aeXox1IKgdVUV4h_1Q3moBEJ7FXMDzCUtfHd7zJDkhSL7Bf81pLeAlTWkC0lVAVXTKS6egI4Q-0-6mFXz7ui7zJM6/https%3A%2F%2Fexperimenthub.bioconductor.org%2Ffetch%2F8129%E2%80%99> > local file path: > ‘/home/rcastelo/.cache/R/ExperimentHub/12ba1aa03_8129’ > reason: Failed writing received data to disk/application > 2: bfcadd() failed; resource removed > rid: BFC3 > fpath: ‘https://experimenthub.bioconductor.org/fetch/8129’ > <https://experimenthub.bioconductor.org/fetch/8129’> > > <https://secure-web.cisco.com/1G9U1udOgqvil7BzSrk1HB2QvPNNeRPXidZLvh_epNXLPv1TrhUqn08C9P35HGdtTOb7o618WNCTyiVyN33-XUDlHCBdrEge6kXsqOKgSLtQvTHIAy-lStrk-VCkYpHvBPBmBnsfje9oWlLBS3j_GHaZhn97VjWPhVuy-Dmaf2COELmWHmMNGFKsbPFgrf9c1uASwhF8epk0meG_S_IDryWy2EhVlyNGlVjBrkp6aeXox1IKgdVUV4h_1Q3moBEJ7FXMDzCUtfHd7zJDkhSL7Bf81pLeAlTWkC0lVAVXTKS6egI4Q-0-6mFXz7ui7zJM6/https%3A%2F%2Fexperimenthub.bioconductor.org%2Ffetch%2F8129%E2%80%99> > reason: download failed > 3: download failed > hub path: ‘https://experimenthub.bioconductor.org/fetch/8129’ > <https://experimenthub.bioconductor.org/fetch/8129’> > > <https://secure-web.cisco.com/1G9U1udOgqvil7BzSrk1HB2QvPNNeRPXidZLvh_epNXLPv1TrhUqn08C9P35HGdtTOb7o618WNCTyiVyN33-XUDlHCBdrEge6kXsqOKgSLtQvTHIAy-lStrk-VCkYpHvBPBmBnsfje9oWlLBS3j_GHaZhn97VjWPhVuy-Dmaf2COELmWHmMNGFKsbPFgrf9c1uASwhF8epk0meG_S_IDryWy2EhVlyNGlVjBrkp6aeXox1IKgdVUV4h_1Q3moBEJ7FXMDzCUtfHd7zJDkhSL7Bf81pLeAlTWkC0lVAVXTKS6egI4Q-0-6mFXz7ui7zJM6/https%3A%2F%2Fexperimenthub.bioconductor.org%2Ffetch%2F8129%E2%80%99> > cache resource: ‘EH8079 : 8129’ > reason: bfcadd() failed; see warnings() > > > [[alternative HTML version deleted]] > > _______________________________________________ > Bioc-devel@r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/bioc-devel > > -- > Robert Castelo, PhD > Associate Professor > Dept. of Medicine and Life Sciences > Universitat Pompeu Fabra (UPF) > Barcelona Biomedical Research Park (PRBB) > Dr Aiguader 88 > E-08003 Barcelona, Spain > telf: +34.933.160.514 -- Robert Castelo, PhD Associate Professor Dept. of Medicine and Life Sciences Universitat Pompeu Fabra (UPF) Barcelona Biomedical Research Park (PRBB) Dr Aiguader 88 E-08003 Barcelona, Spain telf: +34.933.160.514 [[alternative HTML version deleted]] _______________________________________________ Bioc-devel@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/bioc-devel