Great thanks! Thomas
On Wed, Feb 21, 2018 at 3:11 AM Martin Morgan <martin.mor...@roswellpark.org> wrote: > Thanks Thomas, countLines() in ShortRead 1.37.3 and later) will return > numeric() rather than integer() and hence support large files. > > Martin > > On 02/20/2018 10:08 PM, Thomas Girke wrote: > > Dear Martin, > > > > countLines in ShrotRead returns the line counts as integers which appears > > to create problems with large FASTQ files (>536.8 Mio lines) due to R's > > integer limit (2^31-1). When the integer limit is reached/exceeded it > seems > > that countLines returns negative values not reflecting the number of > lines > > in a file anymore. At least this is what I learned after several users > > reported this problem and then running some tests myself on large FASTQ > > files with variable line numbers around the integer limit. If my > conclusion > > is correct and there aren' t any strong reasons against it, would it be > > possible to consider returning numeric values instead either by default > or > > conditionally (e.g. when the count is >= .Machine$integer.max) to lift > this > > limit. If this is not possible then returning NAs instead of negative > > values would be a sensible compromise. > > > > Thanks, > > > > Thomas > > > >> sessionInfo() > > R version 3.4.2 (2017-09-28) > > Platform: x86_64-pc-linux-gnu (64-bit) > > Running under: CentOS Linux 7 (Core) > > > > Matrix products: default > > BLAS: /usr/lib64/libblas.so.3.4.2 > > LAPACK: /usr/lib64/liblapack.so.3.4.2 > > > > locale: > > [1] C > > > > attached base packages: > > [1] stats4 parallel stats graphics utils datasets grDevices > > methods base > > > > other attached packages: > > [1] ShortRead_1.36.0 GenomicAlignments_1.14.1 > > SummarizedExperiment_1.8.0 DelayedArray_0.4.1 > matrixStats_0.52.2 > > Biobase_2.38.0 Rsamtools_1.30.0 > > GenomicRanges_1.30.0 GenomeInfoDb_1.14.0 Biostrings_2.46.0 > > XVector_0.18.0 IRanges_2.12.0 > > S4Vectors_0.16.0 > > [14] BiocParallel_1.12.0 BiocGenerics_0.24.0 setwidth_1.0-4 > > colorout_1.1-3 > > > > loaded via a namespace (and not attached): > > [1] zlibbioc_1.24.0 lattice_0.20-35 hwriter_1.3.2 > > tools_3.4.2 grid_3.4.2 latticeExtra_0.6-28 > > Matrix_1.2-12 GenomeInfoDbData_0.99.1 RColorBrewer_1.1-2 > > bitops_1.0-6 RCurl_1.95-4.8 compiler_3.4.2 > > > > [[alternative HTML version deleted]] > > > > _______________________________________________ > > Bioc-devel@r-project.org mailing list > > https://stat.ethz.ch/mailman/listinfo/bioc-devel > > > > > This email message may contain legally privileged and/or confidential > information. If you are not the intended recipient(s), or the employee or > agent responsible for the delivery of this message to the intended > recipient(s), you are hereby notified that any disclosure, copying, > distribution, or use of this email message is prohibited. If you have > received this message in error, please notify the sender immediately by > e-mail and delete this email message from your computer. Thank you. > [[alternative HTML version deleted]] _______________________________________________ Bioc-devel@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/bioc-devel