Re: [Bioc-sig-seq] ReadFastq error

Hervé Pagès Fri, 19 Feb 2010 16:37:09 -0800

Ramzi,

In case you have trouble or don't want to install R-devel + Bioc-devel,
here is code that should work with release and devel (my sessionInfo
at the end):


  library(Biostrings)
  bset <- read.BStringSet("path/to/your/file", format="fastq")

  dnaletter_cols <- as.integer(
      BString(paste(DNA_ALPHABET, collapse=""))) + 1L

  ndnaletter_per_string <-
      rowSums(alphabetFrequency(bset)[ , dnaletter_cols])

  which(ndnaletter_per_string != width(bset))

Cheers,
H.

> sessionInfo()
R version 2.10.1 (2009-12-14)
x86_64-unknown-linux-gnu

locale:
 [1] LC_CTYPE=en_CA.UTF-8       LC_NUMERIC=C
 [3] LC_TIME=en_CA.UTF-8        LC_COLLATE=en_CA.UTF-8
 [5] LC_MONETARY=C              LC_MESSAGES=en_CA.UTF-8
 [7] LC_PAPER=en_CA.UTF-8       LC_NAME=C
 [9] LC_ADDRESS=C               LC_TELEPHONE=C
[11] LC_MEASUREMENT=en_CA.UTF-8 LC_IDENTIFICATION=C

attached base packages:
[1] stats     graphics  grDevices utils     datasets  methods   base

other attached packages:
[1] Biostrings_2.14.12 IRanges_1.4.11

loaded via a namespace (and not attached):
[1] Biobase_2.6.1 tools_2.10.1


Hervé Pagès wrote:

Hi Ramzi,

One thing you can try is loading your fastq file with:

  library(Biostrings)
  bset <- read.BStringSet("path/to/your/file", format="fastq")

Note the use of read.BStringSet() instead of read.DNAStringSet().

Since BString/BStringSet objects are not limited to the DNA alphabet
(see ?DNA_ALPHABET), you should be able to load your file even if
it contains non-DNA letters (unless it has other problems of course).

Then you can do something like:

  ndnaletter_per_string <-
      vcountPDict(BStringSet(DNA_ALPHABET), bset, collapse=2)
  which(ndnaletter_per_string != width(bset))

to extract the list of fastq records (as an integer vector) that
contain at least 1 non-DNA letter. (Note that the code above works
only with R-devel + BioC-devel.)

That way you'll be able to know if you have records like this and
where they are.

readFastq() won't load a fastq file with non-DNA letters in it.

Cheers,
H.


Ramzi TEMANNI wrote:

Hi,
I'm encountering the following error when trying to load fastq file:

Error in .local(dirPath, pattern, ...) :
  _DNAencode(): key 73 not in lookup table

Key 73 in ascii table is "I" (capital i)

Anyone had encountered such error before ?

Thanks in advance for your help

Regards,
Ramzi

sessionInfo()

R version 2.10.1 (2009-12-14)
x86_64-pc-linux-gnu

locale:
 [1] LC_CTYPE=en_US.UTF-8       LC_NUMERIC=C
 [3] LC_TIME=en_US.UTF-8        LC_COLLATE=en_US.UTF-8
 [5] LC_MONETARY=C              LC_MESSAGES=en_US.UTF-8
 [7] LC_PAPER=en_US.UTF-8       LC_NAME=C
 [9] LC_ADDRESS=C               LC_TELEPHONE=C
[11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C

attached base packages:
[1] stats     graphics  grDevices utils     datasets  methods   base

other attached packages:
[1] biomaRt_2.2.0      ShortRead_1.4.0    lattice_0.18-3
BSgenome_1.14.2
[5] Biostrings_2.14.12 IRanges_1.4.11

loaded via a namespace (and not attached):
[1] Biobase_2.6.1 grid_2.10.1   hwriter_1.1   RCurl_1.3-1   XML_2.6-0

    [[alternative HTML version deleted]]

_______________________________________________
Bioc-sig-sequencing mailing list
[email protected]
https://stat.ethz.ch/mailman/listinfo/bioc-sig-sequencing


--
Hervé Pagès

Program in Computational Biology
Division of Public Health Sciences
Fred Hutchinson Cancer Research Center
1100 Fairview Ave. N, M2-B876
P.O. Box 19024
Seattle, WA 98109-1024

E-mail: [email protected]
Phone:  (206) 667-5791
Fax:    (206) 667-1319

_______________________________________________
Bioc-sig-sequencing mailing list
[email protected]
https://stat.ethz.ch/mailman/listinfo/bioc-sig-sequencing

Re: [Bioc-sig-seq] ReadFastq error

Reply via email to