Ludo,
Thanks for your bug report. As Harris mentioned in a private e-mail, there was an issue at the C-level that resulted in the Hamming distance being inappropriately capped at 1. I just fixed this in BioC 2.6 (Biostrings 2.16.6) and BioC 2.7 (Biostrings 2.17.8). You can obtain these new versions from svn directly now, or wait approximately 24-36 hours to download them via bioconductor.org and biocLite.


> words <- c("lazy", "hazy", "dasy")
> stringDist(words, method='hamming')
  1 2
2 1
3 2 2
> as.matrix(stringDist(words, method='hamming'))
  1 2 3
1 0 1 2
2 1 0 2
3 2 2 0
> sessionInfo()
R version 2.11.1 Patched (2010-05-31 r52167)
i386-apple-darwin9.8.0

locale:
[1] en_US.UTF-8/en_US.UTF-8/C/C/en_US.UTF-8/en_US.UTF-8

attached base packages:
[1] stats     graphics  grDevices utils     datasets  methods   base

other attached packages:
[1] Biostrings_2.16.5 IRanges_1.6.8

loaded via a namespace (and not attached):
[1] Biobase_2.8.0



Patrick



On 6/21/10 7:15 AM, Ludo Pagie wrote:
Hi all,

I want to calculate hamming distance between equal length
strings, ie, number of substution differences between two
strings.
> From the helppage of 'stringDist' I think the following should
return the same results but they don't. What am I doing/seeing
wrong?

words<- c("lazy", "hazy", "dasy")
sapply(words, neditStartingAt,'lazy',starting.at=1)
lazy hazy dasy
    0    1    2
stringDist(words,method='hamming')
      1 2
      2 1
      3 1 1

I want the result as returned by neditStartingAt, clearly.

sessionInfo()
R version 2.12.0 Under development (unstable) (2010-06-17
r52313)
Platform: x86_64-unknown-linux-gnu (64-bit)

locale:
[1] LC_CTYPE=en_US.UTF-8       LC_NUMERIC=C
[3] LC_TIME=en_US.UTF-8        LC_COLLATE=en_US.UTF-8
[5] LC_MONETARY=C              LC_MESSAGES=en_US.UTF-8
[7] LC_PAPER=en_US.UTF-8       LC_NAME=C
[9] LC_ADDRESS=C               LC_TELEPHONE=C
[11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C

attached base packages:
[1] stats     graphics  grDevices utils     datasets
methods   base

other attached packages:
[1] Biostrings_2.17.7 IRanges_1.7.7

loaded via a namespace (and not attached):
[1] Biobase_2.9.0 tools_2.12.0

_______________________________________________
Bioc-sig-sequencing mailing list
[email protected]
https://stat.ethz.ch/mailman/listinfo/bioc-sig-sequencing


_______________________________________________
Bioc-sig-sequencing mailing list
[email protected]
https://stat.ethz.ch/mailman/listinfo/bioc-sig-sequencing

Reply via email to