Re: [R] Partition vector of strings into lines of preferred width

2022-10-28 Thread Leonard Mada via R-help

Dear Andrew,


Thank you for the fast reply. I forgot about strwrap. Though my problem 
is slightly different.



I do have the actual vector. Of course, I could first join the strings - 
but this then involves both a join and a split (done by strwrap). Maybe 
its possible to avoid the join and the split. My 2nd approach may be 
also fine, but I have not tested it thoroughly (and I may miss an 
existing solution).



Sincerely,


Leonard


On 10/29/2022 12:51 AM, Andrew Simmons wrote:

I would suggest using strwrap(), the documentation at ?strwrap has
plenty of details and examples.
For paragraphs, I would usually do something like:

strwrap(x = , width = 80, indent = 4)

On Fri, Oct 28, 2022 at 5:42 PM Leonard Mada via R-help
 wrote:

Dear R-Users,

text = "
What is the best way to split/cut a vector of strings into lines of
preferred width?
I have come up with a simple solution, albeit naive, as it involves many
arithmetic divisions.
I have an alternative idea which avoids this problem.
But I may miss some existing functionality!"

# Long vector of strings:
str = strsplit(text, " |(?<=\n)", perl=TRUE)[[1]];
lenWords = nchar(str);

# simple, but naive solution:
# - it involves many divisions;
cut.character.int = function(n, w) {
  ncm = cumsum(n);
  nwd = ncm %/% w;
  count = rle(nwd)$lengths;
  pos = cumsum(count);
  posS = pos[ - length(pos)] + 1;
  posS = c(1, posS);
  pos = rbind(posS, pos);
  return(pos);
}

npos = cut.character.int(lenWords, w=30);
# lets print the results;
for(id in seq(ncol(npos))) {
 len = npos[2, id] - npos[1, id];
 cat(str[seq(npos[1, id], npos[2, id])], c(rep(" ", len), "\n"));
}


The first solution performs an arithmetic division on all string
lengths. It is possible to find out the total length and divide only the
last element of the cumsum. Something like this should work (although it
is not properly tested).


w = 30;
cumlen = cumsum(lenWords);
max = tail(cumlen, 1) %/% w + 1;
pos = cut(cumlen, seq(0, max) * w);
count = rle(as.numeric(pos))$lengths;
# everything else is the same;
pos = cumsum(count);
posS = pos[ - length(pos)] + 1;
posS = c(1, posS);
pos = rbind(posS, pos);

npos = pos; # then print


The cut() may be optimized as well, as the cumsum is sorted ascending. I
did not evaluate the efficiency of the code either.

But do I miss some existing functionality?


Note:

- technically, the cut() function should probably return a vector of
indices (something like: rep(seq_along(count), count)), but it was more
practical to have both the start and end positions.


Many thanks,


Leonard

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Partition vector of strings into lines of preferred width

2022-10-28 Thread Martin Morgan
> strwrap(text)
[1] "What is the best way to split/cut a vector of strings into lines of"
[2] "preferred width? I have come up with a simple solution, albeit naive,"
[3] "as it involves many arithmetic divisions. I have an alternative idea"
[4] "which avoids this problem. But I may miss some existing functionality!"

Maybe used as

> strwrap(text)  |> paste(collapse = "\n") |> cat("\n")
What is the best way to split/cut a vector of strings into lines of
preferred width? I have come up with a simple solution, albeit naive,
as it involves many arithmetic divisions. I have an alternative idea
which avoids this problem. But I may miss some existing functionality!
>

?

From: R-help  on behalf of Leonard Mada via 
R-help 
Date: Friday, October 28, 2022 at 5:42 PM
To: R-help Mailing List 
Subject: [R] Partition vector of strings into lines of preferred width
Dear R-Users,

text = "
What is the best way to split/cut a vector of strings into lines of
preferred width?
I have come up with a simple solution, albeit naive, as it involves many
arithmetic divisions.
I have an alternative idea which avoids this problem.
But I may miss some existing functionality!"

# Long vector of strings:
str = strsplit(text, " |(?<=\n)", perl=TRUE)[[1]];
lenWords = nchar(str);

# simple, but naive solution:
# - it involves many divisions;
cut.character.int = function(n, w) {
 ncm = cumsum(n);
 nwd = ncm %/% w;
 count = rle(nwd)$lengths;
 pos = cumsum(count);
 posS = pos[ - length(pos)] + 1;
 posS = c(1, posS);
 pos = rbind(posS, pos);
 return(pos);
}

npos = cut.character.int(lenWords, w=30);
# lets print the results;
for(id in seq(ncol(npos))) {
len = npos[2, id] - npos[1, id];
cat(str[seq(npos[1, id], npos[2, id])], c(rep(" ", len), "\n"));
}


The first solution performs an arithmetic division on all string
lengths. It is possible to find out the total length and divide only the
last element of the cumsum. Something like this should work (although it
is not properly tested).


w = 30;
cumlen = cumsum(lenWords);
max = tail(cumlen, 1) %/% w + 1;
pos = cut(cumlen, seq(0, max) * w);
count = rle(as.numeric(pos))$lengths;
# everything else is the same;
pos = cumsum(count);
posS = pos[ - length(pos)] + 1;
posS = c(1, posS);
pos = rbind(posS, pos);

npos = pos; # then print


The cut() may be optimized as well, as the cumsum is sorted ascending. I
did not evaluate the efficiency of the code either.

But do I miss some existing functionality?


Note:

- technically, the cut() function should probably return a vector of
indices (something like: rep(seq_along(count), count)), but it was more
practical to have both the start and end positions.


Many thanks,


Leonard

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Partition vector of strings into lines of preferred width

2022-10-28 Thread Andrew Simmons
I would suggest using strwrap(), the documentation at ?strwrap has
plenty of details and examples.
For paragraphs, I would usually do something like:

strwrap(x = , width = 80, indent = 4)

On Fri, Oct 28, 2022 at 5:42 PM Leonard Mada via R-help
 wrote:
>
> Dear R-Users,
>
> text = "
> What is the best way to split/cut a vector of strings into lines of
> preferred width?
> I have come up with a simple solution, albeit naive, as it involves many
> arithmetic divisions.
> I have an alternative idea which avoids this problem.
> But I may miss some existing functionality!"
>
> # Long vector of strings:
> str = strsplit(text, " |(?<=\n)", perl=TRUE)[[1]];
> lenWords = nchar(str);
>
> # simple, but naive solution:
> # - it involves many divisions;
> cut.character.int = function(n, w) {
>  ncm = cumsum(n);
>  nwd = ncm %/% w;
>  count = rle(nwd)$lengths;
>  pos = cumsum(count);
>  posS = pos[ - length(pos)] + 1;
>  posS = c(1, posS);
>  pos = rbind(posS, pos);
>  return(pos);
> }
>
> npos = cut.character.int(lenWords, w=30);
> # lets print the results;
> for(id in seq(ncol(npos))) {
> len = npos[2, id] - npos[1, id];
> cat(str[seq(npos[1, id], npos[2, id])], c(rep(" ", len), "\n"));
> }
>
>
> The first solution performs an arithmetic division on all string
> lengths. It is possible to find out the total length and divide only the
> last element of the cumsum. Something like this should work (although it
> is not properly tested).
>
>
> w = 30;
> cumlen = cumsum(lenWords);
> max = tail(cumlen, 1) %/% w + 1;
> pos = cut(cumlen, seq(0, max) * w);
> count = rle(as.numeric(pos))$lengths;
> # everything else is the same;
> pos = cumsum(count);
> posS = pos[ - length(pos)] + 1;
> posS = c(1, posS);
> pos = rbind(posS, pos);
>
> npos = pos; # then print
>
>
> The cut() may be optimized as well, as the cumsum is sorted ascending. I
> did not evaluate the efficiency of the code either.
>
> But do I miss some existing functionality?
>
>
> Note:
>
> - technically, the cut() function should probably return a vector of
> indices (something like: rep(seq_along(count), count)), but it was more
> practical to have both the start and end positions.
>
>
> Many thanks,
>
>
> Leonard
>
> __
> R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] Partition vector of strings into lines of preferred width

2022-10-28 Thread Leonard Mada via R-help

Dear R-Users,

text = "
What is the best way to split/cut a vector of strings into lines of 
preferred width?
I have come up with a simple solution, albeit naive, as it involves many 
arithmetic divisions.

I have an alternative idea which avoids this problem.
But I may miss some existing functionality!"

# Long vector of strings:
str = strsplit(text, " |(?<=\n)", perl=TRUE)[[1]];
lenWords = nchar(str);

# simple, but naive solution:
# - it involves many divisions;
cut.character.int = function(n, w) {
    ncm = cumsum(n);
    nwd = ncm %/% w;
    count = rle(nwd)$lengths;
    pos = cumsum(count);
    posS = pos[ - length(pos)] + 1;
    posS = c(1, posS);
    pos = rbind(posS, pos);
    return(pos);
}

npos = cut.character.int(lenWords, w=30);
# lets print the results;
for(id in seq(ncol(npos))) {
   len = npos[2, id] - npos[1, id];
   cat(str[seq(npos[1, id], npos[2, id])], c(rep(" ", len), "\n"));
}


The first solution performs an arithmetic division on all string 
lengths. It is possible to find out the total length and divide only the 
last element of the cumsum. Something like this should work (although it 
is not properly tested).



w = 30;
cumlen = cumsum(lenWords);
max = tail(cumlen, 1) %/% w + 1;
pos = cut(cumlen, seq(0, max) * w);
count = rle(as.numeric(pos))$lengths;
# everything else is the same;
pos = cumsum(count);
posS = pos[ - length(pos)] + 1;
posS = c(1, posS);
pos = rbind(posS, pos);

npos = pos; # then print


The cut() may be optimized as well, as the cumsum is sorted ascending. I 
did not evaluate the efficiency of the code either.


But do I miss some existing functionality?


Note:

- technically, the cut() function should probably return a vector of 
indices (something like: rep(seq_along(count), count)), but it was more 
practical to have both the start and end positions.



Many thanks,


Leonard

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.