Thanks Martin, this has been really helpful, I've reported your 
observations to our sysadmins and they fixed it by modifying the Apache 
config file in our server, replacing the line:

AddType application/x-gzip .gz .tgz

by

AddType application/x-gzip .gz .tgz .bam

and now it works, just in case somebody else experiences this problem in 
the future.

as far as I know, indeed BAM files use a block-compression format 
compatible with GZIP, so I guess curl needs to know about this, and I 
guess the Microsoft Azure Data Lake Cloud is already configured to serve 
that information, reason why the BAM files there were downloading fine.

robert.

On 3/30/23 03:09, Martin Morgan wrote:
>
> Some more not-necessarily helpful observations. You can get verbose 
> output with
>
> curl::curl_fetch_disk(url, tempfile(), handle = new_handle(verbose = 
> TRUE))
>
> and on the command line with curl -v -L …
>
> Also, it seems that other BAM files can be downloaded, e.g., from 
> eh[["EH3502"]] (also httr::with_verbose(eh[["EH3502"]])). Would be 
> worth while verifying this a little more completely; I looked for
>
> mcols(eh)|> as_tibble(rownames="ehid") |> filter(sourcetype == "BAM", 
> rdataclass == "BamFile")
>
> If it’s true that other BAM files are ok, then it points to the way 
> the files are being served on ‘your’ end.
>
> One difference I see is that ‘your’ files have Content-Encoding: gzip, 
> but there is no Content-Encoding tag on the BAM file above. I guess 
> BAM files are (some flavor of) gzip (?), but maybe this is confusing 
> the R curl library…
>
> Martin
>
> *From: *Robert Castelo <robert.cast...@upf.edu>
> *Date: *Wednesday, March 29, 2023 at 4:08 PM
> *To: *Martin Morgan <mtmorgan.b...@gmail.com>, 
> bioc-devel@r-project.org <bioc-devel@r-project.org>
> *Subject: *Re: [Bioc-devel] httr::GET() problem downloading a 
> ExperimentHub resource
>
> good catch, but really enigmatic, BAI files work, but BAM don't:
>
> dat <- 
> read.csv("https://raw.githubusercontent.com/functionalgenomics/gDNAinRNAseqData/devel/inst/extdata/metadata_LiYu22subsetBAMfiles.csv";
>  
> <https://raw.githubusercontent.com/functionalgenomics/gDNAinRNAseqData/devel/inst/extdata/metadata_LiYu22subsetBAMfiles.csv>)
> rdatapath <- strsplit(dat$RDataPath, ":")
> bamfiles <- unlist(rdatapath)[seq(1, 18, 2)]
> baifiles <- unlist(rdatapath)[seq(2, 18, 2)]
>
> bamurls <- paste0(dat$Location_Prefix, bamfiles)
> baiurls <- paste0(dat$Location_Prefix, baifiles)
>
> ## BAM files give error
> for (bf in bamurls) {
>   cat(sprintf("%s\n", basename(bf)))
>   tryCatch({
>     curl::curl_fetch_disk(bf, tempfile())
>   }, error=function(e) message(paste0(e, "\n")))
> }
>
> ## BAI files do not give error
> for (bf in baiurls) {
>   cat(sprintf("%s\n", basename(bf)))
>   tryCatch({
>     curl::curl_fetch_disk(bf, tempfile())
>   }, error=function(e) message(paste0(e, "\n")))
> }
>
> any further idea??
>
> robert.
>
> On 29/3/23 21:10, Martin Morgan wrote:
>
>     Not really helpful but this could be simplified a bit by removing
>     the redirect from experiment hub, and the layer from httr to curl, so
>
>     url =
>     
> "https://functionalgenomics.upf.edu/experimenthub/gdnainrnaseqdata/LiYu22subsetBAMfiles/s32gDNA0.bam";
>
>     curl::curl_fetch_disk(url, tempfile())
>
>     Error in
>     
> curl::curl_fetch_disk("https://functionalgenomics.upf.edu/experimenthub/gdnainrnaseqdata/LiYu22subsetBAMfiles/s32gDNA0.bam";
>     
> <https://functionalgenomics.upf.edu/experimenthub/gdnainrnaseqdata/LiYu22subsetBAMfiles/s32gDNA0.bam>,
>     :
>
>       Failed writing received data to disk/application
>
>     I notice the index file (extension .bai) works; do other BAM files
>     work, too?
>
>     Martin
>
>     *From: *Bioc-devel <bioc-devel-boun...@r-project.org>
>     <mailto:bioc-devel-boun...@r-project.org> on behalf of Robert
>     Castelo <robert.cast...@upf.edu> <mailto:robert.cast...@upf.edu>
>     *Date: *Wednesday, March 29, 2023 at 1:18 PM
>     *To: *bioc-devel@r-project.org <bioc-devel@r-project.org>
>     <mailto:bioc-devel@r-project.org>
>     *Subject: *[Bioc-devel] httr::GET() problem downloading a
>     ExperimentHub resource
>
>     hi,
>
>     we recently added a few new ExperimentHub resources, consisting of
>     BAM
>     files and their corresponding BAI files and hosted in my own server.
>     while it seems that they are accessible, they cannot be downloaded
>     through the ExperimentHub API. the minimum example reproducing the
>     problem is this one (using Bioc devel):
>
>     library(ExperimentHub)
>     httr::GET("https://experimenthub.bioconductor.org/fetch/8129";)
>     Error in curl::curl_fetch_memory(url, handle = handle) :
>        Failed writing received data to disk/application
>
>     while there's apparently no problem to "manually" download the
>     resource
>     using 'download.file()' and loading it with
>     'GenomicAlignments::readGAlignments()':
>
>     download.file("https://experimenthub.bioconductor.org/fetch/8129";,
>     "file.bam")
>     trying URL 'https://experimenthub.bioconductor.org/fetch/8129'
>     Content type 'application/octet-stream' length 13296358 bytes
>     (12.7 MB)
>     ==================================================
>     downloaded 12.7 MB
>
>     gal <- GenomicAlignments::readGAlignments("file.bam")
>     gal[1:3]
>     GAlignments object with 3 alignments and 0 metadata columns:
>            seqnames strand       cigar    qwidth     start end     width
>               <Rle>  <Rle> <character> <integer> <integer> <integer>
>     <integer>
>        [1]     chr1      +       49M1S        50     16208 16256        49
>        [2]     chr1      +       3S47M        50     16976 17022        47
>        [3]     chr1      -  10M177N40M        50     17046 17272       227
>                njunc
>            <integer>
>        [1]         0
>        [2]         0
>        [3]         1
>        -------
>        seqinfo: 2580 sequences from an unspecified genome
>
>     any hint why 'httr::GET()' fails, while 'download.file()' doesn't?
>
>     thanks!!
>
>     robert.
>     ps: just to clarify, the 'httr::GET()' example is behind the
>     following
>     problem:
>
>     eh <- ExperimentHub()
>     z <- eh[["EH8079"]]
>     see ?gDNAinRNAseqData and browseVignettes('gDNAinRNAseqData') for
>     documentation
>     downloading 2 resources
>     retrieving 2 resources
>     |======================================================================|
>
>     100%
>
>     Error: failed to load resource
>        name: EH8079
>        title: RNA-seq data BAM file subset of HRR589632 contaminated
>     with 0%
>     gDNA
>        reason: 1 resources failed to download
>     In addition: Warning messages:
>     1: download failed
>        web resource path:
>     ‘https://experimenthub.bioconductor.org/fetch/8129’
>     <https://experimenthub.bioconductor.org/fetch/8129’>
>     
> <https://secure-web.cisco.com/1G9U1udOgqvil7BzSrk1HB2QvPNNeRPXidZLvh_epNXLPv1TrhUqn08C9P35HGdtTOb7o618WNCTyiVyN33-XUDlHCBdrEge6kXsqOKgSLtQvTHIAy-lStrk-VCkYpHvBPBmBnsfje9oWlLBS3j_GHaZhn97VjWPhVuy-Dmaf2COELmWHmMNGFKsbPFgrf9c1uASwhF8epk0meG_S_IDryWy2EhVlyNGlVjBrkp6aeXox1IKgdVUV4h_1Q3moBEJ7FXMDzCUtfHd7zJDkhSL7Bf81pLeAlTWkC0lVAVXTKS6egI4Q-0-6mFXz7ui7zJM6/https%3A%2F%2Fexperimenthub.bioconductor.org%2Ffetch%2F8129%E2%80%99>
>        local file path:
>     ‘/home/rcastelo/.cache/R/ExperimentHub/12ba1aa03_8129’
>        reason: Failed writing received data to disk/application
>     2: bfcadd() failed; resource removed
>        rid: BFC3
>        fpath: ‘https://experimenthub.bioconductor.org/fetch/8129’
>     <https://experimenthub.bioconductor.org/fetch/8129’>
>     
> <https://secure-web.cisco.com/1G9U1udOgqvil7BzSrk1HB2QvPNNeRPXidZLvh_epNXLPv1TrhUqn08C9P35HGdtTOb7o618WNCTyiVyN33-XUDlHCBdrEge6kXsqOKgSLtQvTHIAy-lStrk-VCkYpHvBPBmBnsfje9oWlLBS3j_GHaZhn97VjWPhVuy-Dmaf2COELmWHmMNGFKsbPFgrf9c1uASwhF8epk0meG_S_IDryWy2EhVlyNGlVjBrkp6aeXox1IKgdVUV4h_1Q3moBEJ7FXMDzCUtfHd7zJDkhSL7Bf81pLeAlTWkC0lVAVXTKS6egI4Q-0-6mFXz7ui7zJM6/https%3A%2F%2Fexperimenthub.bioconductor.org%2Ffetch%2F8129%E2%80%99>
>        reason: download failed
>     3: download failed
>        hub path: ‘https://experimenthub.bioconductor.org/fetch/8129’
>     <https://experimenthub.bioconductor.org/fetch/8129’>
>     
> <https://secure-web.cisco.com/1G9U1udOgqvil7BzSrk1HB2QvPNNeRPXidZLvh_epNXLPv1TrhUqn08C9P35HGdtTOb7o618WNCTyiVyN33-XUDlHCBdrEge6kXsqOKgSLtQvTHIAy-lStrk-VCkYpHvBPBmBnsfje9oWlLBS3j_GHaZhn97VjWPhVuy-Dmaf2COELmWHmMNGFKsbPFgrf9c1uASwhF8epk0meG_S_IDryWy2EhVlyNGlVjBrkp6aeXox1IKgdVUV4h_1Q3moBEJ7FXMDzCUtfHd7zJDkhSL7Bf81pLeAlTWkC0lVAVXTKS6egI4Q-0-6mFXz7ui7zJM6/https%3A%2F%2Fexperimenthub.bioconductor.org%2Ffetch%2F8129%E2%80%99>
>        cache resource: ‘EH8079 : 8129’
>        reason: bfcadd() failed; see warnings()
>
>
>             [[alternative HTML version deleted]]
>
>     _______________________________________________
>     Bioc-devel@r-project.org mailing list
>     https://stat.ethz.ch/mailman/listinfo/bioc-devel
>
> -- 
> Robert Castelo, PhD
> Associate Professor
> Dept. of Medicine and Life Sciences
> Universitat Pompeu Fabra (UPF)
> Barcelona Biomedical Research Park (PRBB)
> Dr Aiguader 88
> E-08003 Barcelona, Spain
> telf: +34.933.160.514

-- 
Robert Castelo, PhD
Associate Professor
Dept. of Medicine and Life Sciences
Universitat Pompeu Fabra (UPF)
Barcelona Biomedical Research Park (PRBB)
Dr Aiguader 88
E-08003 Barcelona, Spain
telf: +34.933.160.514

        [[alternative HTML version deleted]]

_______________________________________________
Bioc-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/bioc-devel

Reply via email to