This file trips up fread around record 170349, inconsistently ... I haven't figured that out yet. readLines, strsplit may be the ultimate solution.
On Thu, Apr 30, 2020 at 7:15 AM Vincent Carey <st...@channing.harvard.edu> wrote: > right, line 35265 of > http://www.ebi.ac.uk/gwas/api/search/downloads/alternative has an > unclosed quote in a field. > > 35265 2019-04-10 30804558 Grove J 2019-02-25 Nat Genet > www.ncbi.nlm.nih.gov/pubmed/30804558 I dentification of > common genetic risk variants for autism spectrum disorder. Autism > spectrum disorder 18 ,381 European ancestry cases, 27,969 > European ancestry controls 2,119 European ancestry cases, 142,379 > Euro pean ancestry controls Intergenic > > chr11:102751102"-? chr11:102751102 0 1 0.037 > 8E-6 5.096910013008056 1.1641443 [NR] > Illumina > [9112387] (imputed) N autism spectrum disorder http:/ > /www.ebi.ac.uk/efo/EFO_0003756 GCST007556 Genome-wide > genotyping array > > On Thu, Apr 30, 2020 at 6:59 AM Martin Morgan <mtmorgan.b...@gmail.com> > wrote: > >> I'd look instead at or around line 35264 for use of quotes, e.g., "3' >> DNA", and change the argument read.delim(quote = "") (though I never get >> that right so probably wrong again...). A comment character might also be a >> problem. >> >> If you point to the location of the file I could investigate further... >> >> Martin >> >> On 4/30/20, 6:55 AM, "Bioc-devel on behalf of Vincent Carey" < >> bioc-devel-boun...@r-project.org on behalf of st...@channing.harvard.edu> >> wrote: >> >> The EBI GWAS catalog is large -- now the download is over 100MB for >> 179K >> associations. A "bug" in the >> package was reported, so I acquired the file by hand. >> >> > nn = >> read.delim("gwas_catalog_v1.0.2-associations_e98_r2020-03-08.tsv", >> sep="\t") >> >> *Warning message:* >> >> *In scan(file = file, what = what, sep = sep, quote = quote, dec = >> dec, :* >> >> * EOF within quoted string* >> >> > dim(nn) >> >> [1] 35264 38 >> >> >> The "bug" is the number 35264 ... >> >> >> > >> >> [1]+ Stopped R >> >> %vjcair> wc gwas_cat*tsv >> >> 179365 13243516 120140148 >> gwas_catalog_v1.0.2-associations_e98_r2020-03-08.tsv >> >> %vjcair> vi gwas_cat*tsv >> >> %vjcair> fg >> >> R >> >> >> > tail(nn) >> >> *Error: C stack usage 98161262 is too close to the limit* >> >> >> *Maybe my R needs to be updated.* >> >> >> *If I use data.table::fread to consume the tsv over HTTP all seems >> well, >> and perhaps* >> >> *I will switch to that.* >> >> -- >> The information in this e-mail is intended only for the >> ...{{dropped:18}} >> >> _______________________________________________ >> Bioc-devel@r-project.org mailing list >> https://stat.ethz.ch/mailman/listinfo/bioc-devel >> > -- The information in this e-mail is intended only for the ...{{dropped:18}} _______________________________________________ Bioc-devel@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/bioc-devel