Hi Felix,

Please see my answer in the issue you opened on GitHub:



On 04/02/2018 06:07 AM, Felix Ernst wrote:
Dear all,

probably this is for Hervé Pagès:

I tried the following code, which should according to ?AAString not work, since 
ÜÖÄ are not part of any AA code.

   3-letter "AAString" instance
seq: ÜÄÖ
R version 3.4.4 (2018-03-15)
Platform: x86_64-w64-mingw32/x64 (64-bit)
Running under: Windows >= 8 x64 (build 9200)

Matrix products: default

[1] LC_COLLATE=German_Germany.1252  LC_CTYPE=German_Germany.1252    
LC_MONETARY=German_Germany.1252 LC_NUMERIC=C
[5] LC_TIME=German_Germany.1252

attached base packages:
[1] stats4    parallel  stats     graphics  grDevices utils     datasets  
methods   base

other attached packages:
[1] Biostrings_2.46.0   XVector_0.18.0      IRanges_2.12.0      
S4Vectors_0.16.0    BiocGenerics_0.24.0

loaded via a namespace (and not attached):
[1] zlibbioc_1.24.0 compiler_3.4.4  tools_3.4.4     yaml_2.1.18

I don’t have access right now to the devel version of Biostrings, bit I checked 
out the current Code in the github repo and its recent changes. I am pretty 
sure, that this behavior is also in the current devel branch. Can someone 
confirm this?

My current interest is in using the XString classes and methods for an 
additional biological string representation. The initial question was, how can 
I restrict this to a certain character set, if the characters are not saved 
byte encoded? The latter option is not available to me, since characters like 
‚«‘ or ‚=‘ result in a two byte code using the charToRaw function. This trips 
up the build of the internal lookup table, which are passed down to the C 

Therefore I looked into, how this is done for an AAString differing from a 
BString. I discovered, that it currently doesn‘t. I also looked into the 
current 2.47.12 repo, which as far as I can tell does not use the 
AMINO_ACID_CODE constant in the creation of an AAString object.

So my questions are:
- What is the best practice for extending a class from XString with a 
restricted character set, which is not byte encoded?
- Is there a way to use byte encoding for chars with two ore more bytes?

  Thanks in advance for any help and suggestions.

Best regards,

PS: regarding the second question: One could change „as.integer(charToRaw(paste(letters, 
collapse="")))“ to „lapply(lapply(letters,charToRaw),as.integer)“ in 
.letterAsByteVal, but in any case it will not be atomic anymore, which I think is 
required to be excepted by the C backend. I didn’t test it.

        [[alternative HTML version deleted]]

Bioc-devel@r-project.org mailing list

Hervé Pagès

Program in Computational Biology
Division of Public Health Sciences
Fred Hutchinson Cancer Research Center
1100 Fairview Ave. N, M1-B514
P.O. Box 19024
Seattle, WA 98109-1024

E-mail: hpa...@fredhutch.org
Phone:  (206) 667-5791
Fax:    (206) 667-1319

Bioc-devel@r-project.org mailing list

Reply via email to