Dear Martin, countLines in ShrotRead returns the line counts as integers which appears to create problems with large FASTQ files (>536.8 Mio lines) due to R's integer limit (2^31-1). When the integer limit is reached/exceeded it seems that countLines returns negative values not reflecting the number of lines in a file anymore. At least this is what I learned after several users reported this problem and then running some tests myself on large FASTQ files with variable line numbers around the integer limit. If my conclusion is correct and there aren' t any strong reasons against it, would it be possible to consider returning numeric values instead either by default or conditionally (e.g. when the count is >= .Machine$integer.max) to lift this limit. If this is not possible then returning NAs instead of negative values would be a sensible compromise.
Thanks, Thomas > sessionInfo() R version 3.4.2 (2017-09-28) Platform: x86_64-pc-linux-gnu (64-bit) Running under: CentOS Linux 7 (Core) Matrix products: default BLAS: /usr/lib64/libblas.so.3.4.2 LAPACK: /usr/lib64/liblapack.so.3.4.2 locale: [1] C attached base packages: [1] stats4 parallel stats graphics utils datasets grDevices methods base other attached packages: [1] ShortRead_1.36.0 GenomicAlignments_1.14.1 SummarizedExperiment_1.8.0 DelayedArray_0.4.1 matrixStats_0.52.2 Biobase_2.38.0 Rsamtools_1.30.0 GenomicRanges_1.30.0 GenomeInfoDb_1.14.0 Biostrings_2.46.0 XVector_0.18.0 IRanges_2.12.0 S4Vectors_0.16.0 [14] BiocParallel_1.12.0 BiocGenerics_0.24.0 setwidth_1.0-4 colorout_1.1-3 loaded via a namespace (and not attached): [1] zlibbioc_1.24.0 lattice_0.20-35 hwriter_1.3.2 tools_3.4.2 grid_3.4.2 latticeExtra_0.6-28 Matrix_1.2-12 GenomeInfoDbData_0.99.1 RColorBrewer_1.1-2 bitops_1.0-6 RCurl_1.95-4.8 compiler_3.4.2 [[alternative HTML version deleted]] _______________________________________________ Bioc-devel@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/bioc-devel