Re: [R] Exceptional slowness with read.csv

2024-04-08 Thread jim holtman
Try reading the lines in (readLines), count the number of both types of
quotes in each line. Find out which are not even and investigate.

On Mon, Apr 8, 2024, 15:24 Dave Dixon  wrote:

> I solved the mystery, but not the problem. The problem is that there's
> an unclosed quote somewhere in those 5 additional records I'm trying to
> access. So read.csv is reading million-character fields. It's slow at
> that. That mystery solved.
>
> However, the the problem persists: how to fix what is obvious to the
> naked eye - a quote not adjacent to a comma - but that read.csv can't
> handle. readLines followed by read.csv(text= ) works great because, in
> that case, read.csv knows where the record terminates. Meaning, read.csv
> throws an exception that I can catch and handle with a quick and clean
> regex expression.
>
> Thanks, I'll take a look at vroom.
>
> -dave
>
> On 4/8/24 09:18, Stevie Pederson wrote:
> > Hi Dave,
> >
> > That's rather frustrating. I've found vroom (from the package vroom)
> > to be helpful with large files like this.
> >
> > Does the following give you any better luck?
> >
> > vroom(file_name, delim = ",", skip = 2459465, n_max = 5)
> >
> > Of course, when you know you've got errors & the files are big like
> > that it can take a bit of work resolving things. The command line
> > tools awk & sed might even be a good plan for finding lines that have
> > errors & figuring out a fix, but I certainly don't envy you.
> >
> > All the best
> >
> > Stevie
> >
> > On Tue, 9 Apr 2024 at 00:36, Dave Dixon  wrote:
> >
> > Greetings,
> >
> > I have a csv file of 76 fields and about 4 million records. I know
> > that
> > some of the records have errors - unmatched quotes, specifically.
> > Reading the file with readLines and parsing the lines with
> > read.csv(text
> > = ...) is really slow. I know that the first 2459465 records are
> > good.
> > So I try this:
> >
> >  > startTime <- Sys.time()
> >  > first_records <- read.csv(file_name, nrows = 2459465)
> >  > endTime <- Sys.time()
> >  > cat("elapsed time = ", endTime - startTime, "\n")
> >
> > elapsed time =   24.12598
> >
> >  > startTime <- Sys.time()
> >  > second_records <- read.csv(file_name, skip = 2459465, nrows = 5)
> >  > endTime <- Sys.time()
> >  > cat("elapsed time = ", endTime - startTime, "\n")
> >
> > This appears to never finish. I have been waiting over 20 minutes.
> >
> > So why would (skip = 2459465, nrows = 5) take orders of magnitude
> > longer
> > than (nrows = 2459465) ?
> >
> > Thanks!
> >
> > -dave
> >
> > PS: readLines(n=2459470) takes 10.42731 seconds.
> >
> > __
> > R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
> > https://stat.ethz.ch/mailman/listinfo/r-help
> > PLEASE do read the posting guide
> > http://www.R-project.org/posting-guide.html
> > 
> > and provide commented, minimal, self-contained, reproducible code.
> >
> [[alternative HTML version deleted]]
>
> __
> R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide
> http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Trouble reading a UTF-16LE file

2024-02-28 Thread jim holtman
Try this:


> x <- file("C:\\Users\\Jim\\Downloads\\PV2-ch2 - R_Help.ANA",+   
> encoding = "UTF-16")> y <- readLines(x)> head(y)[1] "1\t36,74\t0"  
> "2\t269,02\t-44"   "1\t326,62\t29""2\t354,52\t24"
[5] "8\t390,75\t1838"  "2\t395,11\t-1053">

>

Thanks

Jim Holtman
*Data Munger Guru*


*What is the problem that you are trying to solve?Tell me what you want to
do, not how you want to do it.*


On Wed, Feb 28, 2024 at 9:23 AM Ebert,Timothy Aaron  wrote:

> The earlier post had an attached text file that did not go through.
> I hope this link works. I tested it with a coworker, but that is no
> guarantee.
>
>
> https://uflorida-my.sharepoint.com/:u:/g/personal/tebert_ufl_edu/EXf5u_CtTwJCrhdfTBIPr7wBefZHx4P_suj4wAWb8i8HFA?e=iQawhh
>
>
> Regards,
> Tim
>
> __
> R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide
> http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Help request: Parsing docx files for key words and appending to a spreadsheet

2023-12-29 Thread jim holtman
checkout the 'officer' package

Thanks

Jim Holtman
*Data Munger Guru*


*What is the problem that you are trying to solve?Tell me what you want to
do, not how you want to do it.*


On Fri, Dec 29, 2023 at 10:14 AM Andy  wrote:

> Hello
>
> I am trying to work through a problem, but feel like I've gone down a
> rabbit hole. I'd very much appreciate any help.
>
> The task: I have several directories of multiple (some directories, up
> to 2,500+) *.docx files (newspaper articles downloaded from Lexis+) that
> I want to iterate through to append to a spreadsheet only those articles
> that satisfy a condition (i.e., a specific keyword is present for >= 50%
> coverage of the subject matter). Lexis+ has a very specific structure
> and keywords are given in the row "Subject".
>
> I'd like to be able to accomplish the following:
>
> (1) Append the title, the month, the author, the number of words, and
> page number(s) to a spreadsheet
>
> (2) Read each article and extract keywords (in the docs, these are
> listed in 'Subject' section as a list of keywords with a percentage
> showing the extent to which the keyword features in the article (e.g.,
> FAST FASHION (72%)) and to append the keyword and the % coverage to the
> same row in the spreadsheet. However, I want to ensure that the keyword
> coverage meets the threshold of >= 50%; if not, then pass onto the next
> article in the directory. Rinse and repeat for the entire directory.
>
> So far, I've tried working through some Stack Overflow-based solutions,
> but most seem to use the textreadr package, which is now deprecated;
> others use either the officer or the officedown packages. However, these
> packages don't appear to do what I want the program to do, at least not
> in any of the examples I have found, nor in the vignettes and relevant
> package manuals I've looked at.
>
> The first point is, is what I am intending to do even possible using R?
> If it is, then where do I start with this? If these docx files were
> converted to UTF-8 plain text, would that make the task easier?
>
> I am not a confident coder, and am really only just getting my head
> around R so appreciate a steep learning curve ahead, but of course, I
> don't know what I don't know, so any pointers in the right direction
> would be a big help.
>
> Many thanks in anticipation
>
> Andy
>
> __
> R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide
> http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] strptime with +03:00 zone designator

2023-11-05 Thread jim holtman
try using 'lubridate'

> library(lubridate)Attaching package: ‘lubridate’

The following objects are masked from ‘package:base’:

date, intersect, setdiff, union
> x <- "2017-02-28T13:35:00+03:00"> ymd_hms(x)[1] "2017-02-28 10:35:00 UTC"

>



Thanks

Jim Holtman
*Data Munger Guru*


*What is the problem that you are trying to solve?Tell me what you want to
do, not how you want to do it.*


On Sun, Nov 5, 2023 at 3:45 PM Richard O'Keefe  wrote:

> I have some data that includes timestamps like this:
> 2017-02-28T13:35:00+03:00
> The documentation for strptime says that %z expects
> an offset like 0300.  I don't see any way in the documentation
> to get it to accept +hh:mm with a colon separator, and
> everything I tried gave me NA as the answer.
>
> Section 4.2.5.1 of ISO 8601:2004(E) allows both the
> absence of colons in +hh[mm] (basic format) and the
> presence of colons in +hh:mm (extended format).
> Again in section 4.2.5.2 where a zone offset is combined
> with a time of day: if you have hh:mm:ss you are using
> extended format and the offset MUST have a colon; if
> you have hhmmss you are using basic format and the
> offset MUST NOT have a colon.  And again in section
> 4.3.2 (complete representations of date and time of day).
> If you use hyphens and colons in the date and time part
> you MUST have a colon in the zone designator.
>
> So I am dealing with timestamps in strict ISO 8601
> complete extended representation, and it is rather
> frustrating that strptime doesn't deal with it simply.
>
> The simplest thing would be for R's own version of
> strptime to allow an optional colon between the hour
> digits and the minute digits of a zone designator.
>
> I'm about to clone the data source and edit it to
> remove the colons, but is there something obvious
> I am missing?
>
> __
> R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide
> http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Sum data according to date in sequence

2023-11-03 Thread jim holtman
Is this what you are after?

library(tidyverse)


library(lubridate)

input <- structure(list(StationName = c("PALO ALTO CA / CAMBRIDGE #1",
  "PALO ALTO CA / CAMBRIDGE #1", "PALO ALTO CA / CAMBRIDGE #1",
  "PALO ALTO CA / CAMBRIDGE #1", "PALO ALTO CA / CAMBRIDGE #1",
  "PALO ALTO CA / CAMBRIDGE #1", "PALO ALTO CA / CAMBRIDGE #1",
  "PALO ALTO CA / CAMBRIDGE #1", "PALO ALTO CA / CAMBRIDGE #1",
  "PALO ALTO CA / CAMBRIDGE #1", "PALO ALTO CA / CAMBRIDGE #1",
  "PALO ALTO CA / CAMBRIDGE #1", "PALO ALTO CA / CAMBRIDGE #1",
  "PALO ALTO CA / CAMBRIDGE #1", "PALO ALTO CA / CAMBRIDGE #1",
  "PALO ALTO CA / CAMBRIDGE #1", "PALO ALTO CA / CAMBRIDGE #1",
  "PALO ALTO CA / CAMBRIDGE #1", "PALO ALTO CA / CAMBRIDGE #1",
  "PALO ALTO CA / CAMBRIDGE #1"), date = c("1/14/2016", "1/14/2016",
   "1/14/2016", "1/15/2016", "1/15/2016", "1/15/2016", "1/15/2016",
   "1/16/2016", "1/16/2016", "1/16/2016", "1/16/2016", "1/16/2016",
   "1/16/2016", "1/16/2016", "1/17/2016", "1/17/2016", "1/17/2016",
   "1/17/2016", "1/17/2016", "1/18/2016"), time = c("12:09", "19:50",
  "20:22", "8:25", "14:23", "18:17", "21:46", "10:19", "12:12",
  "14:12", "16:22", "19:16", "19:19", "20:24", "9:54", "12:16",
  "13:53", "19:03", "22:00", "8:58"),
  EnergykWh = c(4.680496, 6.272414,
  1.032782, 11.004884, 10.096824, 6.658797, 4.808874, 1.469384,
  2.996239, 0.303222, 4.988339, 8.131804, 0.117156, 3.285669, 1.175608,
  3.677487, 1.068393, 8.820755, 8.138583, 9.0575)),
  row.names = c(NA, 20L), class = "data.frame")
# convert date from character to Date
byDate <- input |>
  mutate(newdate = mdy(date)) |>
  group_by(newdate) |>
  summarise(total = sum(EnergykWh))

byDate

## # A tibble: 5 × 2
##   newdatetotal
##
## 1 2016-01-14 12.0
## 2 2016-01-15 32.6
## 3 2016-01-16 21.3
## 4 2016-01-17 22.9
## 5 2016-01-18  9.06


Thanks

Jim Holtman
*Data Munger Guru*


*What is the problem that you are trying to solve?Tell me what you want to
do, not how you want to do it.*


On Fri, Nov 3, 2023 at 2:51 AM roslinazairimah zakaria 
wrote:

> Hi,
> I tried this:
> # extract date from the time stamp
> dt1 <- cbind(as.Date(dt$EndDate, format="%m/%d/%Y"), dt$EnergykWh)
> head(dt1)
> colnames(dt1) <- c("date", "EnergykWh")
> and
> my dt1 becomes these, the dates are replace by numbers.
>
> dt1 <- cbind(as.Date(dt$EndDate, format="%m/%d/%Y"), dt$EnergykWh)
> dput(head(dt1))
> colnames(dt1) <- c("date", "EnergykWh")
> dput(head(dt1))
>
>
> > dput(head(dt1))structure(c(16814, 16814, 16814, 16815, 16815, 16815,
> 4.680496,
> 6.272414, 1.032782, 11.004884, 10.096824, 6.658797), dim = c(6L,
> 2L), dimnames = list(NULL, c("date", "EnergykWh")))
>
> Then I tried this:
> library(dplyr)
> dt1 %>%
>   group_by(date) %>%
>   summarise(EnergykWh.sum = sum(EnergykWh))
> and got this errors
>
> dt1 %>%+   group_by(date) %>%+   summarise(EnergykWh.sum =
> sum(EnergykWh))Error in UseMethod("group_by") :
>   no applicable method for 'group_by' applied to an object of class
> "c('matrix', 'array', 'double', 'numeric')"
>
>
>
> On Fri, Nov 3, 2023 at 7:23 AM roslinazairimah zakaria <
> roslina...@gmail.com>
> wrote:
>
> > Dear all,
> >
> > I have this set of data. I would like to sum the EnergykWh according date
> > sequences.
> >
> > > head(dt1,20)   StationName  date  time EnergykWh
> > 1  PALO ALTO CA / CAMBRIDGE #1 1/14/2016 12:09  4.680496
> > 2  PALO ALTO CA / CAMBRIDGE #1 1/14/2016 19:50  6.272414
> > 3  PALO ALTO CA / CAMBRIDGE #1 1/14/2016 20:22  1.032782
> > 4  PALO ALTO CA / CAMBRIDGE #1 1/15/2016  8:25 11.004884
> > 5  PALO ALTO CA / CAMBRIDGE #1 1/15/2016 14:23 10.096824
> > 6  PALO ALTO CA / CAMBRIDGE #1 1/15/2016 18:17  6.658797
> > 7  PALO ALTO CA / CAMBRIDGE #1 1/15/2016 21:46  4.808874
> > 8  PALO ALTO CA / CAMBRIDGE #1 1/16/2016 10:19  1.469384
> > 9  PALO ALTO CA / CAMBRIDGE #1 1/16/2016 12:12  2.996239
> > 10 PALO ALTO CA / CAMBRIDGE #1 1/16/2016 14:12  0.303222
> > 11 PALO ALTO CA / CAMBRIDGE #1 1/16/2016 16:22  4.988339
> > 12 PALO ALTO CA / CAMBRIDGE #1 1/16/20

Re: [R] Sum data according to date in sequence

2023-11-02 Thread jim holtman
How about send a 'dput' of some sample data.  My guess is that your date is
'character' and not 'Date'.

Thanks

Jim Holtman
*Data Munger Guru*


*What is the problem that you are trying to solve?Tell me what you want to
do, not how you want to do it.*


On Thu, Nov 2, 2023 at 4:24 PM roslinazairimah zakaria 
wrote:

> Dear all,
>
> I have this set of data. I would like to sum the EnergykWh according date
> sequences.
>
> > head(dt1,20)   StationName  date  time EnergykWh
> 1  PALO ALTO CA / CAMBRIDGE #1 1/14/2016 12:09  4.680496
> 2  PALO ALTO CA / CAMBRIDGE #1 1/14/2016 19:50  6.272414
> 3  PALO ALTO CA / CAMBRIDGE #1 1/14/2016 20:22  1.032782
> 4  PALO ALTO CA / CAMBRIDGE #1 1/15/2016  8:25 11.004884
> 5  PALO ALTO CA / CAMBRIDGE #1 1/15/2016 14:23 10.096824
> 6  PALO ALTO CA / CAMBRIDGE #1 1/15/2016 18:17  6.658797
> 7  PALO ALTO CA / CAMBRIDGE #1 1/15/2016 21:46  4.808874
> 8  PALO ALTO CA / CAMBRIDGE #1 1/16/2016 10:19  1.469384
> 9  PALO ALTO CA / CAMBRIDGE #1 1/16/2016 12:12  2.996239
> 10 PALO ALTO CA / CAMBRIDGE #1 1/16/2016 14:12  0.303222
> 11 PALO ALTO CA / CAMBRIDGE #1 1/16/2016 16:22  4.988339
> 12 PALO ALTO CA / CAMBRIDGE #1 1/16/2016 19:16  8.131804
> 13 PALO ALTO CA / CAMBRIDGE #1 1/16/2016 19:19  0.117156
> 14 PALO ALTO CA / CAMBRIDGE #1 1/16/2016 20:24  3.285669
> 15 PALO ALTO CA / CAMBRIDGE #1 1/17/2016  9:54  1.175608
> 16 PALO ALTO CA / CAMBRIDGE #1 1/17/2016 12:16  3.677487
> 17 PALO ALTO CA / CAMBRIDGE #1 1/17/2016 13:53  1.068393
> 18 PALO ALTO CA / CAMBRIDGE #1 1/17/2016 19:03  8.820755
> 19 PALO ALTO CA / CAMBRIDGE #1 1/17/2016 22:00  8.138583
> 20 PALO ALTO CA / CAMBRIDGE #1 1/18/2016  8:58  9.057500
>
> I have tried this:
> library(dplyr)
> sums <- dt1 %>%
>   group_by(date) %>%
>   summarise(EnergykWh = sum(EnergykWh))
>
> head(sums,20)
>
> The date is not by daily sequence but by year sequence.
>
> > head(sums,20)# A tibble: 20 × 2
>date  EnergykWh
>  1 1/1/2017 25.3   2 1/1/2018 61.0   3
> 1/1/2019  0.627 4 1/1/2020 10.7   5 1/10/201769.4   6
> 1/10/201854.5   7 1/10/201949.1   8 1/10/202045.9   9
> 1/11/201773.9  10 1/11/201853.3  11 1/11/201993.5  12
> 1/11/202066.7  13 1/12/201778.6  14 1/12/201842.2  15
> 1/12/201922.7  16 1/12/202080.9  17 1/13/201785.6  18
> 1/13/201846.4  19 1/13/201940.0  20 1/13/2020   121.
>
>
>
> Thank you very much for any help given.
>
>
> --
> *Roslinazairimah Zakaria*
> *Tel: +609-5492370; Fax. No.+609-5492766*
>
> *Email: roslinazairi...@ump.edu.my ;
> roslina...@gmail.com *
> Faculty of Industrial Sciences & Technology
> University Malaysia Pahang
> Lebuhraya Tun Razak, 26300 Gambang, Pahang, Malaysia
>
> [[alternative HTML version deleted]]
>
> __
> R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide
> http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] How to Reformat a dataframe

2023-10-28 Thread jim holtman
You can also use the pivot_longer to do it:

library(tidyverse)

input <- structure(list(...1 = c(92.9925354, 76.0024254, 44.99547465,
28.00536465, 120.0068103, 31.9980405, 85.0071837, 40.1532933,
19.3120917, 113.12581575, 28.45843425, 114.400074, 143.925,
46.439634, 20.7845679, 50.82874575, 36.9818061, 44.6273556, 40.57804605,
:
:
:
 row.names = c(NA, -126L), class = "data.frame")

> input$row <- seq(nrow(input))  # add row number for reference> head(input)
>...1 ...2 ...3 ...4 ...5 ...6 ...7 ...8
1  92.99254 34.99963 24.04101 43.01330 53.00914 62.01390 91.01036 88.99986
2  76.00243 22.00219 22.00219 25.00378 44.99547 60.99449 63.00499 92.99254
3  44.99547 22.99328 15.00793 15.99902 38.00121 44.00438 62.01390 79.99510
4  28.00536 19.00061 12.99743 12.99743 44.00438 49.01647 55.95410 69.00816
5 120.00681 35.99072 22.99328 47.99706 60.00341 62.01390 60.00341 66.00658
6  31.99804 23.95606 13.98852 15.99902 38.99230 54.99132 88.00877 89.99095
  ...9...10 ...11 ...12 row
1 54.00023 75.01134 111.99314  49.01647   1
2 68.01707 75.01134  82.99669  63.99608   2
3 60.99449 91.01036  84.01609  65.01549   3
4 82.99669 78.01292 135.01474  85.99827   4
5 66.99767 91.01036  88.99986  51.98974   5
6 79.00401 78.01292 113.52225 155.00644   6> > x <-
pivot_longer(input,names_to = "col",cols=1:12)> head(x, 20)# A tibble:
20 × 3
 row col   value
  1 1 ...1   93.0 2 1 ...2   35.0 3 1
...3   24.0 4 1 ...4   43.0 5 1 ...5   53.0 6 1 ...6
62.0 7 1 ...7   91.0 8 1 ...8   89.0 9 1 ...9   54.010
1 ...10  75.011 1 ...11 112. 12 1 ...12  49.013 2 ...1
76.014 2 ...2   22.015 2 ...3   22.016 2 ...4   25.017
2 ...5   45.018 2 ...6   61.019     2 ...7   63.020 2 ...8
93.0

>

Thanks

Jim Holtman
*Data Munger Guru*


*What is the problem that you are trying to solve?Tell me what you want to
do, not how you want to do it.*


On Fri, Oct 27, 2023 at 10:41 PM Paul Bernal  wrote:

> Hi Iris,
>
> Thank you so much for your valuable feedback. I wonder why your code gives
> you 1512 rows, given that the original structure has 12 columns and 126
> rows, so I would expect (125*12)+ 9=1,509 total rows.
>
> Cheers,
> Paul
> El El vie, 27 de oct. de 2023 a la(s) 10:40 p. m., Iris Simmons <
> ikwsi...@gmail.com> escribió:
>
> > You are not getting the structure you want because the indexes are
> > wrong. They should be something more like this:
> >
> > i <- 0
> > for (row in 1:nrow(alajuela_df)){
> >   for (col in 1:ncol(alajuela_df)){
> > i <- i + 1
> > df[i,1]=alajuela_df[row,col]
> >   }
> > }
> >
> > but I think what you are doing can be written much shorter and will run
> > faster:
> >
> > ## transpose here matches your original code
> > df <- data.frame(aportes_alajuela = c(t(alajuela_df)))
> >
> > ## but if you do not want to transpose, then do this
> > df <- data.frame(aportes_alajuela = unlist(alajuela_df, use.names =
> FALSE))
> >
> > However, you said you expected 1509 observations, but this gives you
> > 1512 observations. If you want to exclude the 3 NA observations, do
> > something like:
> >
> > df <- df[!is.na(df$aportes_alajuela), , drop = FALSE]
> >
> > On Fri, Oct 27, 2023 at 11:14 PM Paul Bernal 
> > wrote:
> > >
> > > Dear friends,
> > >
> > > I have the following dataframe:
> > > dim(alajuela_df)
> > > [1] 126  12
> > >
> > > dput(alajuela_df)
> > > structure(list(...1 = c(92.9925354, 76.0024254, 44.99547465,
> > > 28.00536465, 120.0068103, 31.9980405, 85.0071837, 40.1532933,
> > > 19.3120917, 113.12581575, 28.45843425, 114.400074, 143.925,
> > > 46.439634, 20.7845679, 50.82874575, 36.9818061, 44.6273556,
> 40.57804605,
> > > 30.38398005, 47.94042705, 36.38715225, 28.06199835, 28.4867511,
> > > 122.86681215, 56.4071652, 35.9057658, 52.669341, 24.94714485,
> > > 54.4249857, 61.164396, 47.88379335, 30.582198, 26.051502, 43.041612,
> > > 64.59073485, 51.6499344, 78.8202902886201, 35.2390173175627,
> > > 82.2394568898745, 47.760850180466, 54.3654763342294, 49.4878058854839,
> > > 32.8813266149642, 38.9301880693548, 51.9506275455197, 55.4404001992832,
> > > 50.7979761262545, 37.1198211413082, 36.9144309425627, 33.7829493281362,
> > > 32.8647492475806, 42.892686344, 63.9814428257048, 39.219040238172,
> > > 88.7557324417563, 42.0964144925627, 129.15973304991, 117.872998635484,
> > > 35.4004098300179, 83.4102757505377, 38.6443638074373, 100.491764259319,
> > > 40.219162961828, 35.901029409319, 85.281471467473

Re: [R] query in loops

2022-12-05 Thread jim holtman
So what is the problem that you would like help in correcting?  The program
seems to run.

Thanks

Jim Holtman
*Data Munger Guru*


*What is the problem that you are trying to solve?Tell me what you want to
do, not how you want to do it.*


On Mon, Dec 5, 2022 at 12:59 PM ASHLIN VARKEY 
wrote:

> Sir,
> I want to write a loop in R to find the AIC factor. For its calculation, I
> need to run an algorithm in the attached file. Here  'x' represents the
> dataset and xi denotes the i-th observation after arranging it in ascending
> order. Q(u) and q(u) represent the quantile function and quantile density
> function respectively. For my distribution Q(u) and q(u) are given below.
> Q(u)=-α log⁡(1-u)+(b-α)u+((r-b))/2 u^2
> q(u)=b+u(r-b+α/(1-u)).
> Can you please help me to correct this program based on the algorithm?
> *R code*
> x=c(0.047, 0.296, 0.540, 1.271, 0.115, 0.334, 0.570, 1.326, 0.121, 0.395,
> 0.641, 1.447, 0.132, 0.458, 0.644, 1.485, 0.164, 0.466, 0.696, 1.553,
> 0.197, 0.501, 0.841,1.581,
> 0.203,0.507, 0.863, 1.589, 3.743, 0.260, 0.529, 1.099, 2.178, 0.282, 0.534,
> 1.219, 2.343, 2.416, 2.444, 2.825, 2.830, 3.578, 3.658, 3.978, 4.033)
> xi=sort(x)
> xi
> n=45
> alpha=-1.014
> b=.949
> r=3.11
> u=c()
> D=c()
> q=c()
> Q=c()
> for (i in 1:n) {
> u[i]=i/(n+1)
> Q[i]=-alpha*log(1-u[i])+(b-alpha)*u[i]+((r-b)/2)*(u[i]^2)
> q[i]=b+u[i]*(r-b+(alpha/(1-u[i])))
> D[i]=Q[i]-xi[i]
> if (D[i]<(10^-7)) {
>   print (q[i])
> }
> else{
>   u[i+1]=u[i]+((xi[i]-Q[i])/q[i])
> }
>   }
> __
> R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide
> http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Converting a Date variable from character to Date

2022-09-29 Thread jim holtman
Try this by add a "day" to the date field

library(tidyverse)
library(lubridate)
input <- "*PeriodCPI*
2022m1 4994
2022m2 5336
2022m3 5671
2022m4 6532
2022m5 7973
2022m610365
2022m712673
2022m814356
2022m914708"

m_data <- read.delim(text = input, sep = "")

# convert the date by adding a "day" before the conversion

m_data$date <- ymd(paste0(m_data$X.Period, '-1'))
m_data

##   X.Period  CPI.   date
## 1   2022m1  4994 2022-01-01
## 2   2022m2  5336 2022-02-01
## 3   2022m3  5671 2022-03-01
## 4   2022m4  6532 2022-04-01
## 5   2022m5  7973 2022-05-01
## 6   2022m6 10365 2022-06-01
## 7   2022m7 12673 2022-07-01
## 8   2022m8 14356 2022-08-01
    ## 9   2022m9 14708 2022-09-01



Thanks

Jim Holtman
*Data Munger Guru*


*What is the problem that you are trying to solve?Tell me what you want to
do, not how you want to do it.*


On Thu, Sep 29, 2022 at 9:36 AM Admire Tarisirayi Chirume <
atchir...@gmail.com> wrote:

> Kindly request assistance to *convert a Date variable from a character to
> be recognized as a date*.
> NB: kindly take note that the data is in a csv file called *inflation*. I
> have included part of the file content herewith with the header for
> assistance.
>
>
> My data looks like this:
> *PeriodCPI*
> 2022m1 4994
> 2022m2 5336
> 2022m3 5671
> 2022m4 6532
> 2022m5 7973
> 2022m610365
> 2022m712673
> 2022m814356
> 2022m914708
>
>  I used the following command lines.
>
>
> class(inflation.2$cpi)
> inflation.2$cpi <- as.numeric(as.character(inflation.2$cpi))
> *format(as.Date(inflation.2$period), "%Y-%m")*
>
> Having run the command lines above, the variable *period* in the attached
> CSV file remains being read as a character variable. Kindly assist.
>
> Thank you.
>
>
> Alternative email: addtar...@icloud.com/tchir...@rbz.co.zw
> Skype: admirechirume
> Call: +263773369884
> whatsapp: +818099861504
>
>
> On Thu, Sep 29, 2022 at 6:10 PM Jeff Newmiller 
> wrote:
>
> > Your attachment was stripped by the mailing list. The criteria for
> allowed
> > attachments are a bit tricky to translate into actions to apply to your
> > email software, so usually including part of your file in the body of the
> > email is the most successful approach for communicating your problem. Be
> > sure to use a text editor or the
> >
> >   readLines("filename.csv") |> head() |> dput()
> >
> > functions in R to extract lines of your file for inclusion in the email.
> >
> > On September 29, 2022 8:52:30 AM PDT, Admire Tarisirayi Chirume <
> > atchir...@gmail.com> wrote:
> > >I kindly request for assistance to convert a Date variable from a
> > character
> > >to be recognised as a date. I used the following command lines.
> > >
> > >inflation<-read.csv("Inflation_forecasts_1.csv")
> > >attach(inflation)
> > >inflation[,1:2 ] #subsetting the dataframe
> > >#Renaming variables
> > >inflation<- rename(inflation.df,
> > >   cpi = CPI,
> > >   year=period)
> > >
> > >#subsetting data April 2020 to current
> > >inflation.2<-data.frame(inflation[-c(1:135),])
> > >class(inflation.2$cpi)
> > >inflation.2$cpi <- as.numeric(as.character(inflation.2$cpi))
> > >* format(as.Date(inflation.2$period), "%Y-%m")*
> > >
> > >Having ran the command lines above, the variable period in the attached
> > csv
> > >file remains being read as a character variable. Kindly assist.
> > >
> > >Thank you.
> > >__
> > >R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
> > >https://stat.ethz.ch/mailman/listinfo/r-help
> > >PLEASE do read the posting guide
> > http://www.R-project.org/posting-guide.html
> > >and provide commented, minimal, self-contained, reproducible code.
> >
> > --
> > Sent from my phone. Please excuse my brevity.
> >
>
> [[alternative HTML version deleted]]
>
> __
> R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide
> http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] How long does it take to learn the R programming language?

2022-09-29 Thread jim holtman
Still at it after 38 years.  First came across S at Bell Labs in 1984.

Thanks

Jim Holtman
*Data Munger Guru*


*What is the problem that you are trying to solve?Tell me what you want to
do, not how you want to do it.*


On Thu, Sep 29, 2022 at 7:09 AM Ebert,Timothy Aaron  wrote:

> Learning R takes an hour. Find an hourglass, flip it over. Meanwhile we
> will start increasing the size of the upper chamber and adding more sand.
>
> Mastery of R is an asymptotic function of time.
>
> While such answers might indicate trying for mastery is futile, you can
> learn enough R to be very useful long before "mastery."
>
> Tim
> -Original Message-
> From: R-help  On Behalf Of Avi Gross
> Sent: Wednesday, September 28, 2022 5:51 PM
> To: John Kane 
> Cc: R. Mailing List 
> Subject: Re: [R] How long does it take to learn the R programming language?
>
> [External Email]
>
> So is the proper R answer simply Inf?
>
> On Wed, Sep 28, 2022, 5:39 PM John Kane  wrote:
>
> > + 1
> >
> > On Wed, 28 Sept 2022 at 17:36, Jim Lemon  wrote:
> >
> > > Given some of the questions that are posted to this list, I am not
> > > sure that there is an upper bound to the estimate.
> > >
> > > Jim
> > >
> > > __
> > > R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
> > > https://nam10.safelinks.protection.outlook.com/?url=https%3A%2F%2Fst
> > > at.ethz.ch%2Fmailman%2Flistinfo%2Fr-helpdata=05%7C01%7Ctebert%4
> > > 0ufl.edu%7C7229f6c17d764bd2742c08daa19bb65b%7C0d4da0f84a314d76ace60a
> > > 62331e1b84%7C0%7C0%7C63787396320713%7CUnknown%7CTWFpbGZsb3d8eyJW
> > > IjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C300
> > > 0%7C%7C%7Csdata=8KNANsIMtWiElOAwn9pXvx%2BsueyNn329VkvFFx8Paew%3
> > > Dreserved=0
> > > PLEASE do read the posting guide
> > > https://nam10.safelinks.protection.outlook.com/?url=http%3A%2F%2Fwww
> > > .r-project.org%2Fposting-guide.htmldata=05%7C01%7Ctebert%40ufl.
> > > edu%7C7229f6c17d764bd2742c08daa19bb65b%7C0d4da0f84a314d76ace60a62331
> > > e1b84%7C0%7C0%7C63787396320713%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiM
> > > C4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000%7C%
> > > 7C%7Csdata=32nVjz3UeC4QK7dd2PHA76BywkYQP9ucuN%2FWFFAUX8k%3D
> > > ;reserved=0 and provide commented, minimal, self-contained,
> > > reproducible code.
> > >
> >
> >
> > --
> > John Kane
> > Kingston ON Canada
> >
> > [[alternative HTML version deleted]]
> >
> > __
> > R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
> > https://nam10.safelinks.protection.outlook.com/?url=https%3A%2F%2Fstat
> > .ethz.ch%2Fmailman%2Flistinfo%2Fr-helpdata=05%7C01%7Ctebert%40ufl
> > .edu%7C7229f6c17d764bd2742c08daa19bb65b%7C0d4da0f84a314d76ace60a62331e
> > 1b84%7C0%7C0%7C63787396320713%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4w
> > LjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000%7C%7C%7C
> > sdata=8KNANsIMtWiElOAwn9pXvx%2BsueyNn329VkvFFx8Paew%3Dreserv
> > ed=0
> > PLEASE do read the posting guide
> > https://nam10.safelinks.protection.outlook.com/?url=http%3A%2F%2Fwww.r
> > -project.org%2Fposting-guide.htmldata=05%7C01%7Ctebert%40ufl.edu%
> > 7C7229f6c17d764bd2742c08daa19bb65b%7C0d4da0f84a314d76ace60a62331e1b84%
> > 7C0%7C0%7C63787396320713%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwM
> > DAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000%7C%7C%7C
> > sdata=32nVjz3UeC4QK7dd2PHA76BywkYQP9ucuN%2FWFFAUX8k%3Dreserved=0
> > and provide commented, minimal, self-contained, reproducible code.
> >
>
> [[alternative HTML version deleted]]
>
> __
> R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
>
> https://nam10.safelinks.protection.outlook.com/?url=https%3A%2F%2Fstat.ethz.ch%2Fmailman%2Flistinfo%2Fr-helpdata=05%7C01%7Ctebert%40ufl.edu%7C7229f6c17d764bd2742c08daa19bb65b%7C0d4da0f84a314d76ace60a62331e1b84%7C0%7C0%7C63787396320713%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000%7C%7C%7Csdata=8KNANsIMtWiElOAwn9pXvx%2BsueyNn329VkvFFx8Paew%3Dreserved=0
> PLEASE do read the posting guide
> https://nam10.safelinks.protection.outlook.com/?url=http%3A%2F%2Fwww.r-project.org%2Fposting-guide.htmldata=05%7C01%7Ctebert%40ufl.edu%7C7229f6c17d764bd2742c08daa19bb65b%7C0d4da0f84a314d76ace60a62331e1b84%7C0%7C0%7C63787396320713%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2l

Re: [R] able to estimate in the excel but not in R, any suggestion?

2021-12-23 Thread jim holtman
Glad to help!

Happy Holidays

Thanks

Jim Holtman
Data Munger Guru

What is the problem that you are trying to solve?
Tell me what you want to do, not how you want to do it.

On Wed, Dec 22, 2021 at 11:12 PM Marna Wagley  wrote:
>
> Dear Jim,
> Thank you very much for the help. The code seems to be right. Using this 
> code, I got exactly the same value as the excel's value.
> This is great.
> Thanks
> MW
>
> On Wed, Dec 22, 2021 at 10:57 PM jim holtman  wrote:
>>
>> You need to use the 'ifelse' function.  I think I copied down your
>> formula and here is the output:
>>
>> > daT<-structure(list(sd = c(0.481, 0.682, 0.741, 0.394, 0.2, 0.655, 0.375),
>> + mcd = c(51.305, 51.284, 51.249, 51.2, 51.137, 51.059, 50.968), ca =
>> + c(49.313, 69.985, 75.914, 40.303, 20.493, 66.905,38.185)), class =
>> + "data.frame", row.names = c(NA, -7L))
>> > head(daT)
>>  sdmcd ca
>> 1 0.481 51.305 49.313
>> 2 0.682 51.284 69.985
>> 3 0.741 51.249 75.914
>> 4 0.394 51.200 40.303
>> 5 0.200 51.137 20.493
>> 6 0.655 51.059 66.905
>> >
>> > # add in a new column with the calculation
>> >
>> > daT$ca_1 <- with(daT,
>> +ifelse(sd > mcd * 2,
>> +   pi * mcd ^ 2,
>> +   (0.5 * sd) * sqrt(mcd^2 - (0.5 * sd)^2) +
>> +   mcd^2 * asin((0.5 * sd) / (mcd)) * 2
>> +   )
>> + )
>> >
>> > daT
>>  sdmcd ca ca_1
>> 1 0.481 51.305 49.313 37.01651
>> 2 0.682 51.284 69.985 52.46340
>> 3 0.741 51.249 75.914 56.96310
>> 4 0.394 51.200 40.303 30.25918
>> 5 0.200 51.137 20.493 15.34110
>> 6 0.655 51.059 66.905 50.16535
>> 7 0.375 50.968 38.185 28.66948
>> >
>>
>>
>> Thanks
>>
>> Jim Holtman
>> Data Munger Guru
>>
>> What is the problem that you are trying to solve?
>> Tell me what you want to do, not how you want to do it.
>>
>> On Wed, Dec 22, 2021 at 10:23 PM Marna Wagley  wrote:
>> >
>> > Hi R users,
>> > I was trying to estimate some values in r but could not figure out how to
>> > write the script in r. Although I was able to estimate it correctly in the
>> > excel. For example I have the following data set.
>> >
>> > daT<-structure(list(sd = c(0.481, 0.682, 0.741, 0.394, 0.2, 0.655, 0.375),
>> > mcd = c(51.305, 51.284, 51.249, 51.2, 51.137, 51.059, 50.968), ca =
>> > c(49.313, 69.985, 75.914, 40.303, 20.493, 66.905,38.185)), class =
>> > "data.frame", row.names = c(NA, -7L))
>> > head(daT)
>> >
>> > In this data set, I need to estimate in the column name "ca", In the excel
>> > I estimated the value using the following formula:
>> > IF(A2>B2*2,PI()*B2^2,((0.5*A2)*SQRT(B2^2-(0.5*A2)^2)+B2^2*ASIN((0.5*A2)/B2))*2)
>> >
>> > But when I wrote the following code in the R, it did not work
>> > attach(daT)
>> > daT$ca<-if(sd>mcd*2,pi()*mcd^2,((0.5*sd)*sqrt(mcd^2-(0.5*sd)^2)+mcd^2*asin((0.5*sd)/mcd))*2)
>> >
>> > Your suggestion would be highly appreciated.
>> >
>> > Sincerely,
>> >
>> > MW
>> >
>> > [[alternative HTML version deleted]]
>> >
>> > __
>> > R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
>> > https://stat.ethz.ch/mailman/listinfo/r-help
>> > PLEASE do read the posting guide 
>> > http://www.R-project.org/posting-guide.html
>> > and provide commented, minimal, self-contained, reproducible code.

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] able to estimate in the excel but not in R, any suggestion?

2021-12-22 Thread jim holtman
You need to use the 'ifelse' function.  I think I copied down your
formula and here is the output:

> daT<-structure(list(sd = c(0.481, 0.682, 0.741, 0.394, 0.2, 0.655, 0.375),
+ mcd = c(51.305, 51.284, 51.249, 51.2, 51.137, 51.059, 50.968), ca =
+ c(49.313, 69.985, 75.914, 40.303, 20.493, 66.905,38.185)), class =
+ "data.frame", row.names = c(NA, -7L))
> head(daT)
 sdmcd ca
1 0.481 51.305 49.313
2 0.682 51.284 69.985
3 0.741 51.249 75.914
4 0.394 51.200 40.303
5 0.200 51.137 20.493
6 0.655 51.059 66.905
>
> # add in a new column with the calculation
>
> daT$ca_1 <- with(daT,
+ifelse(sd > mcd * 2,
+   pi * mcd ^ 2,
+   (0.5 * sd) * sqrt(mcd^2 - (0.5 * sd)^2) +
+   mcd^2 * asin((0.5 * sd) / (mcd)) * 2
+   )
+ )
>
> daT
 sdmcd ca ca_1
1 0.481 51.305 49.313 37.01651
2 0.682 51.284 69.985 52.46340
3 0.741 51.249 75.914 56.96310
4 0.394 51.200 40.303 30.25918
5 0.200 51.137 20.493 15.34110
6 0.655 51.059 66.905 50.16535
7 0.375 50.968 38.185 28.66948
>


Thanks

Jim Holtman
Data Munger Guru

What is the problem that you are trying to solve?
Tell me what you want to do, not how you want to do it.

On Wed, Dec 22, 2021 at 10:23 PM Marna Wagley  wrote:
>
> Hi R users,
> I was trying to estimate some values in r but could not figure out how to
> write the script in r. Although I was able to estimate it correctly in the
> excel. For example I have the following data set.
>
> daT<-structure(list(sd = c(0.481, 0.682, 0.741, 0.394, 0.2, 0.655, 0.375),
> mcd = c(51.305, 51.284, 51.249, 51.2, 51.137, 51.059, 50.968), ca =
> c(49.313, 69.985, 75.914, 40.303, 20.493, 66.905,38.185)), class =
> "data.frame", row.names = c(NA, -7L))
> head(daT)
>
> In this data set, I need to estimate in the column name "ca", In the excel
> I estimated the value using the following formula:
> IF(A2>B2*2,PI()*B2^2,((0.5*A2)*SQRT(B2^2-(0.5*A2)^2)+B2^2*ASIN((0.5*A2)/B2))*2)
>
> But when I wrote the following code in the R, it did not work
> attach(daT)
> daT$ca<-if(sd>mcd*2,pi()*mcd^2,((0.5*sd)*sqrt(mcd^2-(0.5*sd)^2)+mcd^2*asin((0.5*sd)/mcd))*2)
>
> Your suggestion would be highly appreciated.
>
> Sincerely,
>
> MW
>
> [[alternative HTML version deleted]]
>
> __
> R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] for loop question in R

2021-12-22 Thread jim holtman
You may have to add an explicit 'print' to ggplot

library(ggplot2)
library(tidyverse)
y <- c("hwy","cty")
c <- c("cyl","class")
f <- c("hwy_cyl","cty_class")
mac <- data.frame(y,c,f)
for (i in nrow(mac)){
  mpg %>%filter(hwy <35) %>%
 print(ggplot(aes(x = displ, y = y[i], color = c[i])) + geom_point())
  ggsave("c:/temp/f[i].jpg",width = 9, height = 6, dpi = 1200, units = "in")
}

Thanks

Jim Holtman
Data Munger Guru

What is the problem that you are trying to solve?
Tell me what you want to do, not how you want to do it.


Thanks

Jim Holtman
Data Munger Guru

What is the problem that you are trying to solve?
Tell me what you want to do, not how you want to do it.


On Wed, Dec 22, 2021 at 9:08 AM Kai Yang via R-help
 wrote:
>
> Hello R team,I want to use for loop to generate multiple plots with 3 
> parameter, (y is for y axis, c is for color and f is for file name in 
> output). I created a data frame to save the information and use the 
> information in for loop. I use y[i], c[i] and f[i] in the loop, but it seems 
> doesn't work. Can anyone correct my code to make it work?
> Thanks,Kai
>
> library(ggplot2)library(tidyverse)
> y <- c("hwy","cty")c <- c("cyl","class")f <- c("hwy_cyl","cty_class")
> mac <- data.frame(y,c,f)
> for (i in nrow(mac)){  mpg %>%filter(hwy <35) %>% ggplot(aes(x = 
> displ, y = y[i], color = c[i])) + geom_point()  
> ggsave("c:/temp/f[i].jpg",width = 9, height = 6, dpi = 1200, units = "in")}
>
> [[alternative HTML version deleted]]
>
> __
> R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Changing time intervals in data set

2021-12-15 Thread jim holtman
At least show a sample of the data and then what you would like as output.

Thanks

Jim Holtman
*Data Munger Guru*


*What is the problem that you are trying to solve?Tell me what you want to
do, not how you want to do it.*


On Wed, Dec 15, 2021 at 6:40 AM Rich Shepard 
wrote:

> A 33-year set of river discharge data at one gauge location has recording
> intervals of 5, 10, and 30 minutes over the period of record.
>
> The data.frame/tibble has columns for year, month, day, hour, minute, and
> datetime.
>
> Would difftime() allow me to find the dates when the changes occurred?
>
> TIA,
>
> Rich
>
> __
> R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide
> http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] tidyverse: read_csv() misses column

2021-11-01 Thread jim holtman
drop the select, or put tz in the select

Thanks

Jim Holtman
*Data Munger Guru*


*What is the problem that you are trying to solve?Tell me what you want to
do, not how you want to do it.*


On Mon, Nov 1, 2021 at 3:39 PM Rich Shepard 
wrote:

> On Mon, 1 Nov 2021, CALUM POLWART wrote:
>
> > Mutate. Probably.
>
> Calum,
>
> I thought that I had it working, but I'm still missing a piece.
>
> For example,
> > cor_disc %>%
> + select(year, mon, day, hr, min) %>%
> + mutate(
> + sampdt = make_datetime(year, mon, day, hr, min)
> + )
> # A tibble: 415,263 × 6
>  year   mon   dayhr   min sampdt
>  
>   1  20091023 0 0 2009-10-23 00:00:00
>   2  20091023 015 2009-10-23 00:15:00
>   3  20091023 030 2009-10-23 00:30:00
>   4  20091023 045 2009-10-23 00:45:00
>   5  20091023 1 0 2009-10-23 01:00:00
>   6  20091023 115 2009-10-23 01:15:00
>   7  20091023 130 2009-10-23 01:30:00
>   8  20091023 145 2009-10-23 01:45:00
>   9  20091023 2 0 2009-10-23 02:00:00
> 10  20091023 215 2009-10-23 02:15:00
> # … with 415,253 more rows
>
> produces the sampdt column, but it, and the timezone, are not present in
> the
> cor_disc tibble:
>
> > cor_disc A tibble: 415,263 × 8
> site_nbr  year   mon   dayhr   min tz  cfs
>   
>   1 14171600  20091023 0 0 PDT8750
>   2 14171600  20091023 015 PDT8750
>   3 14171600  20091023 030 PDT8750
>   4 14171600  20091023 045 PDT8750
>   5 14171600  20091023 1 0 PDT8750
>   6 14171600  20091023 115 PDT8750
>   7 14171600  20091023 130 PDT8750
>   8 14171600  20091023 145 PDT8730
>   9 14171600  20091023 2 0 PDT8730
> 10 14171600  20091023 215 PDT8730
> # … with 415,253 more rows
>
> Is the error in the select() or mutate() function specifications?
>
> Thanks,
>
> Rich
>
> __
> R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide
> http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Looping through data error

2021-04-13 Thread jim holtman
Your code was formatted incorrectly.  There is always a problem with the
'else' statement after an 'if' since in R there is no semicolon to mark the
end of a line.  Here might be a better format for your code.  I would
recommend the liberal use of "{}"s when using 'if/else'



i <- 0

for (i in 1:(nrow(PLC_Return) - 1)) {
  if (i == 1) {
NUMBER_OF_SHARES[i] = 100 / is.na(CLOSE_SHARE_PRICE[i])
  } else {
if (is.na(PLC_Return[i, 1]) == is.na(PLC_Return[i + 1, 1]) {
  NUMBER_OF_SHARES[i] = 0
} else {
  NUMBER_OF_SHARES[i] = 100 / is.na(CLOSE_SHARE_PRICE[i])
    }
  }
}


Jim Holtman
*Data Munger Guru*


*What is the problem that you are trying to solve?Tell me what you want to
do, not how you want to do it.*


On Tue, Apr 13, 2021 at 5:51 AM e-mail ma015k3113 via R-help <
r-help@r-project.org> wrote:

> Dear All,I have a dataframe with 4 variables and I am trying to calculate
> how many shares can be purchased with £100 in the first year when the
> company was listed
>
> The data looks like:
>
> COMPANY_NUMBER YEAR_END_DATE CLOSE_SHARE_PRICE  NUMBER_OF_SHARES
> 2270530/09/2002
> NA 0
> 2270530/09/2004
>  NA  0
> 2270530/09/2005
> 6.55 0
> 2270530/09/2006
> 7.5   0
> 2270530/09/2007
> 9.65 0
> 2270530/09/2008
> 6.55 0
> 109134731/01/20108.14
>0
> 1091347 31/01/2011  11.38
>0
> 11356069   30/06/2019  1.09
>0
> SC192761 31/01/2000 NA
>0
> SC192761 31/01/2001 NA
>0
> SC192761  31/01/2002NA
>0
> SC192761 31/01/2004 NA
>0
> SC192761 31/01/2005 NA
>0
> SC192761  31/01/2006  1.09
>0
> SC192761  31/01/2008  1.24
>0
> SC192761  31/01/2009   0.9
> 0
> SC192761  31/01/2010 1.14
>   0
> SC192761   31/01/20111.25
>   0
> SC192761  31/01/2012 1.29
>   0
>
>
> The code I have written is
>
> i <- 0
>
> for (i in 1:(nrow(PLC_Return)-1))
> if (i == 1)
> {
> NUMBER_OF_SHARES[i] = 100/is.na(CLOSE_SHARE_PRICE[i])
> } else if
> (is.na(PLC_Return[i, 1]) == is.na(PLC_Return[i + 1, 1])
> {
> NUMBER_OF_SHARES[i]=0
> } else
> {
> NUMBER_OF_SHARES[i] = 100/is.na(CLOSE_SHARE_PRICE[i])
> }
>
>
> The error I get is Error: unexpected 'else' in:
>
> " NUMBER_OF_SHARES[i] = 0
> } else"
> > {NUMBER_OF_SHARES[i] = 100/is.na(CLOSE_SHARE_PRICE[i])}
> >
> > }
> Error: unexpected '}' in "}"
>
>
> Don't know how to fix it-any help will be appreciated.
>
>
> Kind regards
>
>
> Ahson
> [[alternative HTML version deleted]]
>
> __
> R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide
> http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Read

2021-02-22 Thread jim holtman
This gives the desired output:

> library(tidyverse)
> text <-  "x1  x2  x3 x4\n1 B12 \n2   C23 \n322 B32  D34 \n4   
>  D44 \n51 D53\n60 D62 "
>
> # read in the data as characters and split to a list
> input <- str_split(str_trim(read_lines(text)), ' +')
>
> max_cols <- 4  # assume a max of 4 columns
>
> put data in the correct column
> x_matrix <- do.call(rbind, map(input, ~{
+   result <- character(max_cols)
+   result[1] <- .x[1]
+   for (i in 2:length(.x)){
+ result[as.integer(str_sub(.x[i], -1))] <- .x[i]
+   }
+   result
+ }))
>
> # now add commas to convert to CSV
> x_csv <- apply(x_matrix, 1, paste, collapse = ',')
>
> # now read in and create desired output
> read_csv(x_csv)
# A tibble: 6 x 4
 x1 x2x3x4
 
1 1 B12 
2 2   C23   
3   322 B32 D34
4 4 D44
551   D53   
660 D62 
>



Jim Holtman
Data Munger Guru

What is the problem that you are trying to solve?
Tell me what you want to do, not how you want to do it.


Jim Holtman
Data Munger Guru

What is the problem that you are trying to solve?
Tell me what you want to do, not how you want to do it.


On Mon, Feb 22, 2021 at 6:20 PM Avi Gross via R-help
 wrote:
>
> This discussion is a bit weird so can we step back.
>
> Someone wants help on how to read in a file that apparently was not written
> following one of several consistent sets of rules.
>
> If it was fixed width, R has functions that can read that.
>
> If it was separated by commas, tabs, single spaces, arbitrary whitespace,
> with or without a header line, we have functions that can read that if
> properly called.
>
> ALL the above normally assume that all the resulting columns are the same
> length. If any are meant to be shorter, you still leave the separators in
> place and put some NA or similar into the result. And, the functions we
> normally talk about do NOT read in and produce multiple vectors but
> something like a data.frame.
>
> So the choice is either to make sure the darn data is in a consistent
> format, or try a different plan. Fair enough?
>
> Some are suggesting parsing it yourself line by line. Certainly that can be
> done. But unless you know some schema to help you disambiguate, what do you
> do it you reach a row that is too short and has enough data for two columns.
> Which of the columns do you assign it to? If you had a clear rule, ...
>
> And what if you have different data types? R does not handle that within a
> single vector or row of a data.frame, albeit it can if you make it a list
> column.
>
> If this data is a one-time thing, perhaps it should be copied into something
> like EXCEL by a human and edited so every column is filled as you wish and
> THEN saved as something like a CSV file and then it can happily be imported
> the usual way, including NA values as needed.
>
> If the person really wants 4 independent vectors of different lengths to
> read in, there are plenty of ways to do that and no need to lump them in
> this odd format.
>
>
>
> -Original Message-
> From: R-help  On Behalf Of jim holtman
> Sent: Monday, February 22, 2021 9:01 PM
> To: Jeff Newmiller 
> Cc: r-help@R-project.org (r-help@r-project.org) 
> Subject: Re: [R] Read
>
> It looks like we can look at the last digit of the data and that would be
> the column number; is that correct?
>
> Jim Holtman
> Data Munger Guru
>
> What is the problem that you are trying to solve?
> Tell me what you want to do, not how you want to do it.
>
>
> Jim Holtman
> Data Munger Guru
>
> What is the problem that you are trying to solve?
> Tell me what you want to do, not how you want to do it.
>
>
> On Mon, Feb 22, 2021 at 5:34 PM Jeff Newmiller 
> wrote:
> >
> > This gets it into a data frame. If you know which columns should be
> numeric you can convert them.
> >
> > s <-
> > "x1  x2  x3 x4
> > 1 B22
> > 2 C33
> > 322 B22  D34
> > 4 D44
> > 51 D53
> > 60 D62
> > "
> >
> > tc <- textConnection( s )
> > lns <- readLines(tc)
> > close(tc)
> > if ( "" == lns[ length( lns ) ] )
> >   lns <- lns[ -length( lns ) ]
> >
> > L <- strsplit( lns, " +" )
> > m <- do.call( rbind, lapply( L[-1], function(v) if
> > (length(v) > ) else v ) ) colnames( m ) <- L[[1]] result <- as.data.frame( m,
> > stringsAsFactors = FALSE ) result
> >
> > On February 22, 2021 4:42:57 PM PST, Val  wrote:
> > >That is my problem. The spacing between columns is not consistent.
> > >It
> > &

Re: [R] Read

2021-02-22 Thread jim holtman
It looks like we can look at the last digit of the data and that would
be the column number; is that correct?

Jim Holtman
Data Munger Guru

What is the problem that you are trying to solve?
Tell me what you want to do, not how you want to do it.


Jim Holtman
Data Munger Guru

What is the problem that you are trying to solve?
Tell me what you want to do, not how you want to do it.


On Mon, Feb 22, 2021 at 5:34 PM Jeff Newmiller  wrote:
>
> This gets it into a data frame. If you know which columns should be numeric 
> you can convert them.
>
> s <-
> "x1  x2  x3 x4
> 1 B22
> 2 C33
> 322 B22  D34
> 4 D44
> 51 D53
> 60 D62
> "
>
> tc <- textConnection( s )
> lns <- readLines(tc)
> close(tc)
> if ( "" == lns[ length( lns ) ] )
>   lns <- lns[ -length( lns ) ]
>
> L <- strsplit( lns, " +" )
> m <- do.call( rbind, lapply( L[-1], function(v) if (length(v) c( v, rep(NA, length(L[[1]]) - length(v) ) ) else v ) )
> colnames( m ) <- L[[1]]
> result <- as.data.frame( m, stringsAsFactors = FALSE )
> result
>
> On February 22, 2021 4:42:57 PM PST, Val  wrote:
> >That is my problem. The spacing between columns is not consistent.  It
> >  may be  single space  or multiple spaces (two or three).
> >
> >On Mon, Feb 22, 2021 at 6:14 PM Bill Dunlap 
> >wrote:
> >>
> >> You said the column values were separated by space characters.
> >> Copying the text from gmail shows that some column names and column
> >> values are separated by single spaces (e.g., between x1 and x2) and
> >> some by multiple spaces (e.g., between x3 and x4.  Did the mail mess
> >> up the spacing or is there some other way to tell where the omitted
> >> values are?
> >>
> >> -Bill
> >>
> >> On Mon, Feb 22, 2021 at 2:54 PM Val  wrote:
> >> >
> >> > I Tried that one and it did not work. Please see the error message
> >> > Error in read.table(text = "x1  x2  x3 x4\n1 B12 \n2   C23
> >> > \n322 B32  D34 \n4D44 \n51 D53\n60 D62
> >",
> >> > :
> >> >   more columns than column names
> >> >
> >> > On Mon, Feb 22, 2021 at 5:39 PM Bill Dunlap
> > wrote:
> >> > >
> >> > > Since the columns in the file are separated by a space character,
> >" ",
> >> > > add the read.table argument sep=" ".
> >> > >
> >> > > -Bill
> >> > >
> >> > > On Mon, Feb 22, 2021 at 2:21 PM Val  wrote:
> >> > > >
> >> > > > Hi all, I am trying to read a messy data  but facing
> >difficulty.  The
> >> > > > data has several columns separated by blank space(s).  Each
> >column
> >> > > > value may have different lengths across the rows.   The first
> >> > > > row(header) has four columns. However, each row may not have
> >the four
> >> > > > column values.  For instance, the first data row has only the
> >first
> >> > > > two column values. The fourth data row has the first and last
> >column
> >> > > > values, the second and the third column values are missing for
> >this
> >> > > > row..  How do I read this data set correctly? Here is my sample
> >data
> >> > > > set, output and desired output.   To make it clear to each data
> >point
> >> > > > I have added the row and column numbers. I cannot use fixed
> >width
> >> > > > format reading because each row  may have different length for
> >a
> >> > > > given column.
> >> > > >
> >> > > > dat<-read.table(text="x1  x2  x3 x4
> >> > > > 1 B22
> >> > > > 2 C33
> >> > > > 322 B22  D34
> >> > > > 4 D44
> >> > > > 51 D53
> >> > > > 60 D62",header=T, fill=T,na.strings=c("","NA"))
> >> > > >
> >> > > > Output
> >> > > >   x1  x2 x3 x4
> >> > > > 1   1 B12  NA
> >> > > > 2   2C23   NA
> >> > > > 3 322  B32  D34   NA
> >> > > > 4   4   D44NA
> >> > > > 5  51 D53 NA
> >> > > > 6  60 D62NA
> >> > > >
> >> > > >
> >> > > > Desired output
>

Re: [R] Read

2021-02-22 Thread jim holtman
Messed up did not see your 'desired' output which will be hard since there
is not a consistent number of spaces that would represent the desired
column number.  Do you have any hit as to how to interpret the spacing
especially you have several hundred more lines?  Is the output supposed to
the 'fixed' field?

Jim Holtman
*Data Munger Guru*


*What is the problem that you are trying to solve?Tell me what you want to
do, not how you want to do it.*


On Mon, Feb 22, 2021 at 5:00 PM jim holtman  wrote:

> Try this:
>
> > library(tidyverse)
>
> > text <-  "x1  x2  x3 x4\n1 B12 \n2   C23 \n322 B32  D34 \n4
>D44 \n51 D53\n60 D62 "
>
> > # read in the data as characters and replace multiple blanks with single
> blank
> > input <- read_lines(text)
>
> > input <- str_replace_all(input, ' +', ' ')
>
> > mydata <- read_delim(input, ' ', col_names = TRUE)
> Warning: 5 parsing failures.
> row col  expectedactual file
>   1  -- 4 columns 3 columns literal data
>   2  -- 4 columns 3 columns literal data
>   4  -- 4 columns 3 columns literal data
>   5  -- 4 columns 2 columns literal data
>   6  -- 4 columns 3 columns literal data
>
> > mydata
> # A tibble: 6 x 4
>  x1 x2x3x4
>  
> 1 1 B12   NANA
> 2     2 C23   NANA
> 3   322 B32   D34   NA
> 4 4 D44   NANA
> 551 D53   NANA
> 660 D62   NANA
> >
>
> Jim Holtman
> Data Munger Guru
>
> What is the problem that you are trying to solve?
> Tell me what you want to do, not how you want to do it.
>
>
> Jim Holtman
> *Data Munger Guru*
>
>
> *What is the problem that you are trying to solve?Tell me what you want to
> do, not how you want to do it.*
>
>
> On Mon, Feb 22, 2021 at 4:49 PM Val  wrote:
>
>> That is my problem. The spacing between columns is not consistent.  It
>>   may be  single space  or multiple spaces (two or three).
>>
>> On Mon, Feb 22, 2021 at 6:14 PM Bill Dunlap 
>> wrote:
>> >
>> > You said the column values were separated by space characters.
>> > Copying the text from gmail shows that some column names and column
>> > values are separated by single spaces (e.g., between x1 and x2) and
>> > some by multiple spaces (e.g., between x3 and x4.  Did the mail mess
>> > up the spacing or is there some other way to tell where the omitted
>> > values are?
>> >
>> > -Bill
>> >
>> > On Mon, Feb 22, 2021 at 2:54 PM Val  wrote:
>> > >
>> > > I Tried that one and it did not work. Please see the error message
>> > > Error in read.table(text = "x1  x2  x3 x4\n1 B12 \n2   C23
>> > > \n322 B32  D34 \n4D44 \n51 D53\n60 D62 ",
>> > > :
>> > >   more columns than column names
>> > >
>> > > On Mon, Feb 22, 2021 at 5:39 PM Bill Dunlap 
>> wrote:
>> > > >
>> > > > Since the columns in the file are separated by a space character, "
>> ",
>> > > > add the read.table argument sep=" ".
>> > > >
>> > > > -Bill
>> > > >
>> > > > On Mon, Feb 22, 2021 at 2:21 PM Val  wrote:
>> > > > >
>> > > > > Hi all, I am trying to read a messy data  but facing
>> difficulty.  The
>> > > > > data has several columns separated by blank space(s).  Each column
>> > > > > value may have different lengths across the rows.   The first
>> > > > > row(header) has four columns. However, each row may not have the
>> four
>> > > > > column values.  For instance, the first data row has only the
>> first
>> > > > > two column values. The fourth data row has the first and last
>> column
>> > > > > values, the second and the third column values are missing for
>> this
>> > > > > row..  How do I read this data set correctly? Here is my sample
>> data
>> > > > > set, output and desired output.   To make it clear to each data
>> point
>> > > > > I have added the row and column numbers. I cannot use fixed width
>> > > > > format reading because each row  may have different length for  a
>> > > > > given column.
>> > > > >
>> > > > > dat<-read.table(text="x1  x2  x3 x4
>> > > > > 1 B22
>> > > > > 2 C33
>> > > > > 322 B22  D34
>> > > > > 4 

Re: [R] Read

2021-02-22 Thread jim holtman
Try this:

> library(tidyverse)

> text <-  "x1  x2  x3 x4\n1 B12 \n2   C23 \n322 B32  D34 \n4
 D44 \n51 D53\n60 D62 "

> # read in the data as characters and replace multiple blanks with single
blank
> input <- read_lines(text)

> input <- str_replace_all(input, ' +', ' ')

> mydata <- read_delim(input, ' ', col_names = TRUE)
Warning: 5 parsing failures.
row col  expectedactual file
  1  -- 4 columns 3 columns literal data
  2  -- 4 columns 3 columns literal data
  4  -- 4 columns 3 columns literal data
  5  -- 4 columns 2 columns literal data
  6  -- 4 columns 3 columns literal data

> mydata
# A tibble: 6 x 4
 x1 x2x3x4
 
1 1 B12   NANA
2 2 C23   NANA
3   322 B32   D34   NA
4 4 D44   NANA
551 D53   NA    NA
660 D62   NANA
>

Jim Holtman
Data Munger Guru

What is the problem that you are trying to solve?
Tell me what you want to do, not how you want to do it.


Jim Holtman
*Data Munger Guru*


*What is the problem that you are trying to solve?Tell me what you want to
do, not how you want to do it.*


On Mon, Feb 22, 2021 at 4:49 PM Val  wrote:

> That is my problem. The spacing between columns is not consistent.  It
>   may be  single space  or multiple spaces (two or three).
>
> On Mon, Feb 22, 2021 at 6:14 PM Bill Dunlap 
> wrote:
> >
> > You said the column values were separated by space characters.
> > Copying the text from gmail shows that some column names and column
> > values are separated by single spaces (e.g., between x1 and x2) and
> > some by multiple spaces (e.g., between x3 and x4.  Did the mail mess
> > up the spacing or is there some other way to tell where the omitted
> > values are?
> >
> > -Bill
> >
> > On Mon, Feb 22, 2021 at 2:54 PM Val  wrote:
> > >
> > > I Tried that one and it did not work. Please see the error message
> > > Error in read.table(text = "x1  x2  x3 x4\n1 B12 \n2   C23
> > > \n322 B32  D34 \n4D44 \n51 D53\n60 D62 ",
> > > :
> > >   more columns than column names
> > >
> > > On Mon, Feb 22, 2021 at 5:39 PM Bill Dunlap 
> wrote:
> > > >
> > > > Since the columns in the file are separated by a space character, "
> ",
> > > > add the read.table argument sep=" ".
> > > >
> > > > -Bill
> > > >
> > > > On Mon, Feb 22, 2021 at 2:21 PM Val  wrote:
> > > > >
> > > > > Hi all, I am trying to read a messy data  but facing  difficulty.
> The
> > > > > data has several columns separated by blank space(s).  Each column
> > > > > value may have different lengths across the rows.   The first
> > > > > row(header) has four columns. However, each row may not have the
> four
> > > > > column values.  For instance, the first data row has only the first
> > > > > two column values. The fourth data row has the first and last
> column
> > > > > values, the second and the third column values are missing for this
> > > > > row..  How do I read this data set correctly? Here is my sample
> data
> > > > > set, output and desired output.   To make it clear to each data
> point
> > > > > I have added the row and column numbers. I cannot use fixed width
> > > > > format reading because each row  may have different length for  a
> > > > > given column.
> > > > >
> > > > > dat<-read.table(text="x1  x2  x3 x4
> > > > > 1 B22
> > > > > 2 C33
> > > > > 322 B22  D34
> > > > > 4 D44
> > > > > 51 D53
> > > > > 60 D62",header=T, fill=T,na.strings=c("","NA"))
> > > > >
> > > > > Output
> > > > >   x1  x2 x3 x4
> > > > > 1   1 B12  NA
> > > > > 2   2C23   NA
> > > > > 3 322  B32  D34   NA
> > > > > 4   4   D44NA
> > > > > 5  51 D53 NA
> > > > > 6  60 D62NA
> > > > >
> > > > >
> > > > > Desired output
> > > > >x1   x2 x3   x4
> > > > > 1   1B22   NA
> > > > > 2   2 C33 NA
> > > > > 3 322  B32NA  D34
> > > > > 4   4  NA  D44
> > > > > 5  51D53 NA
> > > > > 6  60   D62  NA
> > > > >
&g

Re: [R] Get 3 values not only 1

2021-01-27 Thread jim holtman
Is this what you are after?  You need to store a vector in the list:

> 
> # Data
> PIB.hab<-c(12000,34000,25000,43000,12500,32400,76320,45890,76345,90565,76580,45670,23450,34560,65430,65435,56755,87655,90755,45675)
> ISQ.2018<-c(564,587,489,421,478,499,521,510,532,476,421,467,539,521,478,532,449,487,465,500)
>
> Dataset=data.frame(ISQ.2018,PIB.hab)
>
> #plot
> plot(ISQ.2018,PIB.hab)
> plot(ISQ.2018,PIB.hab, main="Droite de régression linéaire", xlab="Score ISQ 
> 2018", ylab="PIB/hab")
>
> #OLS fit
> fit1<-lm(PIB.hab~ISQ.2018)
> lines(ISQ.2018, fitted(fit1), col="blue", lwd=2)
>
> # Create a list to store the results
> lst<-list()
>
> # This statement does the repetitions (looping)
>
> for(i in 1 :1000)
+ {
+
+   n=dim(Dataset)[1]
+   p=0.667
+   sam<-sample(1 :n,floor(p*n),replace=FALSE)
+   Training <-Dataset [sam,]
+   Testing <- Dataset [-sam,]
+   fit2<-lm(PIB.hab~ISQ.2018)
+   ypred<-predict(fit2,newdata=Testing)
+   y<-Dataset[-sam,]$PIB.hab
+   MSE <- mean((y-ypred)^2)
+   biais <- mean(ypred-y)
+   variance <-mean((ypred- mean(ypred))^2)
+
+   lst[[i]] <- c(MSE = MSE,
+biais = biais,
+variance = variance)
+   # lst[i]<-MSE
+   # lst[i]<-biais
+   # lst[i]<-variance
+
+ }
>
> # convert to a matrix
>
> x <- as.matrix(do.call(rbind, lst))
> colMeans(x)
   MSE biais  variance
  5.418175e+08 -4.524548e+01  6.321856e+07
>

Jim Holtman
Data Munger Guru

What is the problem that you are trying to solve?
Tell me what you want to do, not how you want to do it.


Jim Holtman
Data Munger Guru

What is the problem that you are trying to solve?
Tell me what you want to do, not how you want to do it.


On Wed, Jan 27, 2021 at 12:37 PM varin sacha via R-help
 wrote:
>
> Dear R-experts,
>
> Here below my R code working but I would like to get 3 values not only 1. The 
> value I get is, according to my R code, the variance value. My goal is to get 
> 3 values : the bias value, the variance value and the MSE value. How to solve 
> my problem ?
>
> Many thanks.
>
> 
> # Data
> PIB.hab<-c(12000,34000,25000,43000,12500,32400,76320,45890,76345,90565,76580,45670,23450,34560,65430,65435,56755,87655,90755,45675)
> ISQ.2018<-c(564,587,489,421,478,499,521,510,532,476,421,467,539,521,478,532,449,487,465,500)
>
> Dataset=data.frame(ISQ.2018,PIB.hab)
>
> #plot
> plot(ISQ.2018,PIB.hab)
> plot(ISQ.2018,PIB.hab, main="Droite de régression linéaire", xlab="Score ISQ 
> 2018", ylab="PIB/hab")
>
> #OLS fit
> fit1<-lm(PIB.hab~ISQ.2018)
> lines(ISQ.2018, fitted(fit1), col="blue", lwd=2)
>
> # Create a list to store the results
> lst<-list()
>
> # This statement does the repetitions (looping)
>
> for(i in 1 :1000)
> {
>
> n=dim(Dataset)[1]
> p=0.667
> sam<-sample(1 :n,floor(p*n),replace=FALSE)
> Training <-Dataset [sam,]
> Testing <- Dataset [-sam,]
> fit2<-lm(PIB.hab~ISQ.2018)
> ypred<-predict(fit2,newdata=Testing)
> y<-Dataset[-sam,]$PIB.hab
> MSE <- mean((y-ypred)^2)
> biais <- mean(ypred-y)
> variance <-mean((ypred- mean(ypred))^2)
>
> lst[i]<-MSE
> lst[i]<-biais
> lst[i]<-variance
>
> }
> mean(unlist(lst))
> 
>
> __
> R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] seq.Date when date is the last date of the month

2021-01-07 Thread jim holtman
yes it is the expected behaviour is you check the documentation:

Using "month" first advances the month without changing the day: if
this results in an invalid day of the month, it is counted forward
into the next month: see the examples.

Jim Holtman
Data Munger Guru

What is the problem that you are trying to solve?
Tell me what you want to do, not how you want to do it.


Jim Holtman
Data Munger Guru

What is the problem that you are trying to solve?
Tell me what you want to do, not how you want to do it.


On Thu, Jan 7, 2021 at 11:20 AM Jeremie Juste  wrote:
>
> Hello,
>
> I recently bumped into a behavior that surprised me.
> When performing the following command, I would expect the second
> argument to be "2012-09-30" but got "2012-10-01" instead
> > seq(as.Date("2012-08-31"),by="1 month",length=2)
> [1] "2012-08-31" "2012-10-01"
>
> When the same command is performed for the start of the month. I get a
> result I expect.
> > seq(as.Date("2012-08-01"),by="1 month",length=2)
> [1] "2012-08-01"
>
>
> Is there an explanation for this behavior?
>
> Best regards,
> --
> Jeremie Juste
>
> __
> R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Error: Discrete value supplied to continuous variable

2020-12-28 Thread jim holtman
You setup your X & Y axis incorrectly.  in your call to ggplot you have:

g <-df %>%
  ggplot(aes(x=reorder(job,-span), y=span, fill=factor(job))) +

but in your call to geom_rect you are using a completely different set
of variables that is causing the error:

 geom_rect(aes(xmin = ID - w/2,
xmax = ID + w/2,
ymin = ymin,
ymax = ymax,
fill = factor(job)), alpha=0.25)

I changed the call to ggplot to at least have the same variables types
and got a plot out of it; is this what you were expecting:

  ggplot(aes(x=ID, y=ymin, fill=factor(job))) +



Jim Holtman
Data Munger Guru

What is the problem that you are trying to solve?
Tell me what you want to do, not how you want to do it.


Jim Holtman
Data Munger Guru

What is the problem that you are trying to solve?
Tell me what you want to do, not how you want to do it.


On Mon, Dec 28, 2020 at 4:33 PM King, Barry  wrote:
>
> I am attempting to convert an original schedule to longest operating time 
> next schedule then create waterfall plot. I am having trouble with the plot. 
> I get an Error: Discrete vale supplied to continuous variable message. My 
> example code appears below.
>
> library(tidyverse)
> library(ggplot2)
>
> # original schedule of four jobs
> df <- data.frame(job=c("A","B","C","D"),
>  original_begin=c("2021-01-05 07:00:00", "2021-05-01 
> 08:30:00",
>  "2021-05-01 10:30:00", "2021-05-01 
> 14:00:00"),
>  original_end=c("2021-01-05 08:30:00", "2021-05-01 10:30:00",
>"2021-05-01 14:00:00", "2021-05-01 16:00:00"))
>
> # represent date/times as POSIXct objects
> df$original_begin <- as.POSIXct(df$original_begin)
> df$original_end <- as.POSIXct(df$original_end)
>
> # calculate span, length of time each job takes
> df$span <- as.numeric(difftime(df$original_end,df$original_begin))
>
> # sort by span descending
> df <- arrange(df,-span)
>
> # assign ID now that df is correcly sorted
> df$ID <- as.numeric(rownames(df))
>
> # calculate ymin and ymax
> df$ymin[1] <- min(df$original_begin)
> for (i in 1:(nrow(df)-1)) {
>   df$ymax[i] <- df$ymin[i] + df$span[i]
>   df$ymin[i+1] <- df$ymax[i]
> }
> df$ymax[nrow(df)] <- df$ymin[nrow(df)] +
>   df$span[nrow(df)]
>
> # following is loosely based on tutorial found at
> # https://www.stomperusa.com/2019/05/27/basic-waterfall-graphs-in-r/
>
> # set up plot canvas, longest job first (see x=reorder(job,-span))
> g <-df %>%
>   ggplot(aes(x=reorder(job,-span), y=span, fill=factor(job))) +
>   theme_classic() +
>   theme(legend.title=element_blank())+
>   theme(legend.position = "right", panel.grid = element_blank(),
> axis.text.x = element_text(angle = 90, vjust = 0.5)) +
>   labs(y = "Hours", x = "Job")
> g  # seems to be working as expected through here
>
> w <- 0.5  # use to set width of bars
>
> # attempt to create waterfall plot
> g <- g +
>   geom_rect(aes(xmin = ID - w/2,
> xmax = ID + w/2,
> ymin = ymin,
> ymax = ymax,
> fill = factor(job)), alpha=0.25)
> g
>
> Any assistance is appreciated
>
>
> Sent from Mail<https://go.microsoft.com/fwlink/?LinkId=550986> for Windows 10
>
>
> [[alternative HTML version deleted]]
>
> __
> R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Replace double slashes with single backslash

2020-12-28 Thread jim holtman
Why do you want to replace '\\' with '\' in the file names?  They are
actually single '\' in the character string, but are printing out as '\\'.
see example below:

> x <- 'a\\b'
> x
[1] "a\\b"
> nchar(x)
[1] 3

Jim Holtman
*Data Munger Guru*


*What is the problem that you are trying to solve?Tell me what you want to
do, not how you want to do it.*


On Mon, Dec 28, 2020 at 1:20 PM Bert Gunter  wrote:

> "\" is an escape in R. See ?Quotes for details.
>
> Bert Gunter
>
> "The trouble with having an open mind is that people keep coming along and
> sticking things into it."
> -- Opus (aka Berkeley Breathed in his "Bloom County" comic strip )
>
>
> On Mon, Dec 28, 2020 at 12:56 PM Anbu A  wrote:
>
> > Hi All,
> > I am able to replace "r" with "x" for the word "Users" just for a test
> run.
> > *Code: newlist %>% mutate(.,new_col=str_replace(fpath,"r","x"))  *- this
> > works fine
> > But when I try to replace "\\" with "\".
> > *newlist %>% mutate(.,new_col=str_replace(fpath,"\\","\")) *, I get a
> > prompt ">" to complete the code. Not working. There is something on
> > backslashes to be "masked".
> > Any help would be appreciated.
> >
> >fpath new_col
> > 1 C:\\Users\\Anbu\\Desktop\\sas\\ C:\\Usexs\\Anbu\\Desktop\\sas\\
> > 2 C:\\Users\\Anbu\\Desktop\\sas\\ C:\\Usexs\\Anbu\\Desktop\\sas\\
> > 3 C:\\Users\\Anbu\\Desktop\\sas\\ C:\\Usexs\\Anbu\\Desktop\\sas\\
> > 4 C:\\Users\\Anbu\\Desktop\\sas\\ C:\\Usexs\\Anbu\\Desktop\\sas\\
> > 5 C:\\Users\\Anbu\\Desktop\\sas\\ C:\\Usexs\\Anbu\\Desktop\\sas\\
> >
> > Thanks,
> > Anbu.
> >
> > [[alternative HTML version deleted]]
> >
> > __
> > R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
> > https://stat.ethz.ch/mailman/listinfo/r-help
> > PLEASE do read the posting guide
> > http://www.R-project.org/posting-guide.html
> > and provide commented, minimal, self-contained, reproducible code.
> >
>
> [[alternative HTML version deleted]]
>
> __
> R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide
> http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] fusion of two matrices (numerical and logical)

2020-09-18 Thread jim holtman
Here is a way of doing it using the 'arr.ind' option in 'which'

> A <- 1:20
> B <- matrix(A,nrow=5,ncol=4)
> B
 [,1] [,2] [,3] [,4]
[1,]16   11   16
[2,]27   12   17
[3,]38   13   18
[4,]49   14   19
[5,]5   10   15   20
> # B is a numerical matrix
> C <- B<7
> C[4,4] <- TRUE
> C
 [,1]  [,2]  [,3]  [,4]
[1,] TRUE  TRUE FALSE FALSE
[2,] TRUE FALSE FALSE FALSE
[3,] TRUE FALSE FALSE FALSE
[4,] TRUE FALSE FALSE  TRUE
[5,] TRUE FALSE FALSE FALSE
>
> # initialize a 'result' with zeros
> result <- array(0, dim = dim(B))
>
> # get the indices of values to replace
> indx <- which(C, arr.ind = TRUE)
>
> result[indx] <- B[indx]
>
> result
 [,1] [,2] [,3] [,4]
[1,]1600
[2,]2000
[3,]    30    00
[4,]400   19
[5,]5000
>


Jim Holtman
*Data Munger Guru*


*What is the problem that you are trying to solve?Tell me what you want to
do, not how you want to do it.*


On Sat, Sep 5, 2020 at 11:51 AM Bert Gunter  wrote:

> A is not a matrix. I presume you meant B. If so:
>
> > B[!C] <- 0
> > B
>  [,1] [,2] [,3] [,4]
> [1,]1600
> [2,]2000
> [3,]3000
> [4,]400   19
> [5,]5000
>
> Cheers,
> Bert
>
>
>
>
>
> On Sat, Sep 5, 2020 at 11:18 AM Vivek Sutradhara 
> wrote:
>
> > Hi
> > I would like to get help in combining two matrices. Here is my example:
> > A <- 1:20
> > B <- matrix(A,nrow=5,ncol=4)
> > # B is a numerical matrix
> > C <- B<7
> > C[4,4] <- TRUE
> > # C is a logical matrix
> > # if I combine A and C, I get a vector
> > D1 <- A[C==TRUE]
> > D1
> > D2 <- A[C==FALSE]
> > D2
> >
> > I want to get a matrix with the same dimensions as matrix A. At the
> > coordinates given by the vector D1, I want to retain the values in
> > matrix A. At the locations in D2, I want a zero value.
> > I want to know if I can do this without using any loops.
> > Thanks, Vivek
> >
> > [[alternative HTML version deleted]]
> >
> > __
> > R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
> > https://stat.ethz.ch/mailman/listinfo/r-help
> > PLEASE do read the posting guide
> > http://www.R-project.org/posting-guide.html
> > and provide commented, minimal, self-contained, reproducible code.
> >
>
> [[alternative HTML version deleted]]
>
> __
> R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide
> http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Help with locating error on import of data

2020-06-23 Thread jim holtman
one of the problems with Excel is that people can put anything in any
column.  You might want to restrict which columns you are reading
since if it finds data in some cells and there is not a header, it
will create one.

Jim Holtman
Data Munger Guru

What is the problem that you are trying to solve?
Tell me what you want to do, not how you want to do it.

On Tue, Jun 23, 2020 at 8:09 AM Patrick (Malone Quantitative)
 wrote:
>
> It looks like it's looking for column names in the first row of your Excel
> sheet and not finding them. What does the first row contain?
>
> On Tue, Jun 23, 2020 at 10:57 AM Ahson via R-help 
> wrote:
>
> > I have imported data from an Excel file and I am getting errors:
> >
> > > library(readxl)
> > > Balance_sheet <- read_excel("Y:/All Documents/Training/Data/Routines for
> > consolidating all the data/Individual tables/AIM
> > companies/Balance_sheet.xlsx", na = "")
> > New names:
> > * `` -> ...6
> > * `` -> ...7
> > * `` -> ...9
> > * `` -> ...10
> > * `` -> ...11
> > * ... and 22 more problems
> >
> >
> > How can I find where the error is originating? What does New names mean?
> >
> > Thanks in advance for your help.
> >
> >
> > Sent from Mail for Windows 10
> >
> >
> > [[alternative HTML version deleted]]
> >
> > __
> > R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
> > https://stat.ethz.ch/mailman/listinfo/r-help
> > PLEASE do read the posting guide
> > http://www.R-project.org/posting-guide.html
> > and provide commented, minimal, self-contained, reproducible code.
> >
>
>
> --
> Patrick S. Malone, Ph.D., Malone Quantitative
> NEW Service Models: http://malonequantitative.com
>
> He/Him/His
>
> [[alternative HTML version deleted]]
>
> __
> R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Creating file from raw content

2020-06-01 Thread jim holtman
You can read it in as 'raw'


input <- file('your.xlsx', open = 'rb')  # open as binary
excel_file <- readBin(input, raw(), 1e8)  # make sure you read in all the file
close(input)

output <- file('your_new.xlsx', 'wb')
writeBin(excel_file, output)
close(output)
=======


Jim Holtman
Data Munger Guru

What is the problem that you are trying to solve?
Tell me what you want to do, not how you want to do it.


Jim Holtman
Data Munger Guru

What is the problem that you are trying to solve?
Tell me what you want to do, not how you want to do it.


On Fri, May 29, 2020 at 12:12 PM Sebastien Bihorel via R-help
 wrote:
>
> Hi,
>
> Let's say I can extract the content of an Excel .xlsx file stored in a 
> database and store it as raw content in an R object. What would be the proper 
> way to "create" a .xlsx file and "transfer" the content of this obj into it? 
> I took the example of an Excel file, but my question would extend to any kind 
> of binary file.
>
> Thank you in advance for your input
>
> Sebastien
> __
> R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Conditions in R (Help Post)

2019-10-22 Thread jim holtman
Here is another way of doing it by computing the index based on the
conditions

> input <- read_delim(" YEAR   DAY  X Y   Sig
+   1981 9 -0.213 1.08   1.10
+   198110  0.065 1.05   1.05", delim = ' ', trim_ws = TRUE)
>
> input <- mutate(input,
+   phase = case_when(X < 0 & Y < 0 & Y < X ~ 'phase=1',
+ X < 0 & Y < 0 & Y > X ~ 'phase=2',
+ X < 0 & Y > 0 & Y < X ~ 'phase=7',
+ X < 0 & Y > 0 & Y > X ~ 'phase=8',
+ X > 0 & Y < 0 & Y < X ~ 'phase=3',
+ X > 0 & Y < 0 & Y > X ~ 'phase=4',
+ X > 0 & Y > 0 & Y > X ~ 'phase=6',
+ X > 0 & Y > 0 & Y < X ~ 'phase=5',
+ TRUE ~ 'unknown'
+   )
+ )
> input
# A tibble: 2 x 6
   YEAR   DAY  X Y   Sig phase

1  1981 9 -0.213  1.08  1.1  phase=8
2  198110  0.065  1.05  1.05 phase=6
>
> # another way of doing it by constructing an integer to be used as
> # index for the phase value based on the evaluation of X<0, Y<0 and Y
> index <- with(input,
+   (X < 0) * 4 + (Y < 0) * 2 + (Y < X)
+ )
>
> phase_val <- c(6, 5, 4, 3, 8, 7, 2, 1)
>
> input$phase_2 <- paste0('phase=', phase_val[index + 1L])
> input
# A tibble: 2 x 7
   YEAR   DAY  X Y   Sig phase   phase_2
   
1  1981 9 -0.213  1.08  1.1  phase=8 phase=8
2  198110  0.065  1.05  1.05 phase=6 phase=6
>

Jim Holtman
*Data Munger Guru*


*What is the problem that you are trying to solve?Tell me what you want to
do, not how you want to do it.*


On Tue, Oct 22, 2019 at 11:28 AM jim holtman  wrote:

> Had the condition for phase=2 incorrect:
>
> library(tidyverse)
> input <- read_delim(" YEAR   DAY  X Y   Sig
>   1981 9 -0.213 1.08   1.10
>   198110  0.065 1.05   1.05", delim = ' ', trim_ws = TRUE)
>
> input <- mutate(input,
>   phase = case_when(X < 0 & Y < 0 & Y < X ~ 'phase=1',
> X < 0 & Y < 0 & Y > X ~ 'phase=2',
> X < 0 & Y > 0 & Y < X ~ 'phase=7',
>     X < 0 & Y > 0 & Y > X ~ 'phase=8',
> X > 0 & Y < 0 & Y < X ~ 'phase=3',
> X > 0 & Y < 0 & Y > X ~ 'phase=4',
> X > 0 & Y > 0 & Y > X ~ 'phase=6',
> X > 0 & Y > 0 & Y < X ~ 'phase=5',
> TRUE ~ 'unknown'
>   )
> )
>
> Jim Holtman
> *Data Munger Guru*
>
>
> *What is the problem that you are trying to solve?Tell me what you want to
> do, not how you want to do it.*
>
>
> On Tue, Oct 22, 2019 at 11:20 AM jim holtman  wrote:
>
>> Here is one way of doing it; I think the output you show is wrong:
>>
>> library(tidyverse)
>> input <- read_delim(" YEAR   DAY  X Y   Sig
>>   1981 9 -0.213 1.08   1.10
>>   198110  0.065 1.05   1.05", delim = ' ', trim_ws = TRUE)
>>
>> input <- mutate(input,
>>   phase = case_when(X < 0 & Y < 0 & Y < X ~ 'phase=1',
>> X < 0 & Y > 0 & Y < X ~ 'phase=2',
>>         X < 0 & Y > 0 & Y < X ~ 'phase=7',
>> X < 0 & Y > 0 & Y > X ~ 'phase=8',
>> X > 0 & Y < 0 & Y < X ~ 'phase=3',
>> X > 0 & Y < 0 & Y > X ~ 'phase=4',
>> X > 0 & Y > 0 & Y > X ~ 'phase=6',
>> X > 0 & Y > 0 & Y < X ~ 'phase=5',
>> TRUE ~ 'unknown'
>>   )
>> )
>>
>> > input
>> # A tibble: 2 x 6
>>YEAR   DAY  X Y   Sig phase
>> 
>> 1  1981 9 -0.213  1.08  1.1  phase=8
>> 2  198110  0.065  1.05  1.05 phase=6
>>
>> Jim Holtman
>> *Data Munger Guru*
>>
>>
>> *What is the problem that you are trying to solve?Tell me what you want
>> to do, not how you want to do it.*
>>
>>
>> On Tue, Oct 22, 2019 at 9:43 AM Yeasmin Alea 
>> wrote:
>>
>>> Hello Team
>>> I would like to add a new column (for example-Phase) from the below data
>>> set based on the conditions
>>>YEAR   DAY  X Y   Sig
>>>  1  1981 9 -0.213 1.08   1.10
>>>  2  198110  0.065 1.05   1.05
>>> *Conditions*
>>>
>>> D$Phase=sapply(D

Re: [R] Conditions in R (Help Post)

2019-10-22 Thread jim holtman
Had the condition for phase=2 incorrect:

library(tidyverse)
input <- read_delim(" YEAR   DAY  X Y   Sig
  1981 9 -0.213 1.08   1.10
  198110  0.065 1.05   1.05", delim = ' ', trim_ws = TRUE)

input <- mutate(input,
  phase = case_when(X < 0 & Y < 0 & Y < X ~ 'phase=1',
X < 0 & Y < 0 & Y > X ~ 'phase=2',
X < 0 & Y > 0 & Y < X ~ 'phase=7',
X < 0 & Y > 0 & Y > X ~ 'phase=8',
X > 0 & Y < 0 & Y < X ~ 'phase=3',
X > 0 & Y < 0 & Y > X ~ 'phase=4',
X > 0 & Y > 0 & Y > X ~ 'phase=6',
X > 0 & Y > 0 & Y < X ~ 'phase=5',
TRUE ~ 'unknown'
  )
)

Jim Holtman
*Data Munger Guru*


*What is the problem that you are trying to solve?Tell me what you want to
do, not how you want to do it.*


On Tue, Oct 22, 2019 at 11:20 AM jim holtman  wrote:

> Here is one way of doing it; I think the output you show is wrong:
>
> library(tidyverse)
> input <- read_delim(" YEAR   DAY  X Y   Sig
>   1981 9 -0.213 1.08   1.10
>   198110  0.065 1.05   1.05", delim = ' ', trim_ws = TRUE)
>
> input <- mutate(input,
>   phase = case_when(X < 0 & Y < 0 & Y < X ~ 'phase=1',
> X < 0 & Y > 0 & Y < X ~ 'phase=2',
> X < 0 & Y > 0 & Y < X ~ 'phase=7',
> X < 0 & Y > 0 & Y > X ~ 'phase=8',
> X > 0 & Y < 0 & Y < X ~ 'phase=3',
> X > 0 & Y < 0 & Y > X ~ 'phase=4',
> X > 0 & Y > 0 & Y > X ~ 'phase=6',
> X > 0 & Y > 0 & Y < X ~ 'phase=5',
> TRUE ~ 'unknown'
>   )
> )
>
> > input
> # A tibble: 2 x 6
>YEAR   DAY  X Y   Sig phase
> 
> 1  1981 9 -0.213  1.08  1.1  phase=8
> 2  198110  0.065  1.05  1.05 phase=6
>
> Jim Holtman
> *Data Munger Guru*
>
>
> *What is the problem that you are trying to solve?Tell me what you want to
> do, not how you want to do it.*
>
>
> On Tue, Oct 22, 2019 at 9:43 AM Yeasmin Alea 
> wrote:
>
>> Hello Team
>> I would like to add a new column (for example-Phase) from the below data
>> set based on the conditions
>>YEAR   DAY  X Y   Sig
>>  1  1981 9 -0.213 1.08   1.10
>>  2  198110  0.065 1.05   1.05
>> *Conditions*
>>
>> D$Phase=sapply(D,function(a,b) {
>>  a <-D$X
>>  b<-D$Y
>>  if (a<0 && b<0 && b> {phase=1} else if (a<0 && b<0 && b>a)
>> {phase=2} else if (a<0 && b>0 && b> {phase=7} else if (a<0 && b>0 && b>a)
>> {phase=8} else if (a>0 && b<0 && b> {phase=3} else if (a>0 && b<0 && b>a)
>> {phase=4} else if (a>0 && b>0 && b>a)
>> {phase=6} else (a>0 && b>0 && b> {phase=5}
>> })
>>
>> Can anyone help to fix the script to get a Phase column based on the
>> conditions. The table will be like the below
>>YEAR   DAY  X Y   Sig  Phase
>>  1  1981 9 -0.213 1.08   1.10   phase=7
>>  2  198110  0.065 1.05   1.05   phase=6
>>
>> Many thanks
>> Alea
>>
>> [[alternative HTML version deleted]]
>>
>> __
>> R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
>> https://stat.ethz.ch/mailman/listinfo/r-help
>> PLEASE do read the posting guide
>> http://www.R-project.org/posting-guide.html
>> and provide commented, minimal, self-contained, reproducible code.
>>
>

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Conditions in R (Help Post)

2019-10-22 Thread jim holtman
Here is one way of doing it; I think the output you show is wrong:

library(tidyverse)
input <- read_delim(" YEAR   DAY  X Y   Sig
  1981 9 -0.213 1.08   1.10
  198110  0.065 1.05   1.05", delim = ' ', trim_ws = TRUE)

input <- mutate(input,
  phase = case_when(X < 0 & Y < 0 & Y < X ~ 'phase=1',
X < 0 & Y > 0 & Y < X ~ 'phase=2',
X < 0 & Y > 0 & Y < X ~ 'phase=7',
X < 0 & Y > 0 & Y > X ~ 'phase=8',
X > 0 & Y < 0 & Y < X ~ 'phase=3',
X > 0 & Y < 0 & Y > X ~ 'phase=4',
X > 0 & Y > 0 & Y > X ~ 'phase=6',
X > 0 & Y > 0 & Y < X ~ 'phase=5',
TRUE ~ 'unknown'
  )
)

> input
# A tibble: 2 x 6
   YEAR   DAY  X Y   Sig phase

1  1981 9 -0.213  1.08  1.1  phase=8
2  198110  0.065  1.05  1.05 phase=6

Jim Holtman
*Data Munger Guru*


*What is the problem that you are trying to solve?Tell me what you want to
do, not how you want to do it.*


On Tue, Oct 22, 2019 at 9:43 AM Yeasmin Alea  wrote:

> Hello Team
> I would like to add a new column (for example-Phase) from the below data
> set based on the conditions
>YEAR   DAY  X Y   Sig
>  1  1981 9 -0.213 1.08   1.10
>  2  198110  0.065 1.05   1.05
> *Conditions*
>
> D$Phase=sapply(D,function(a,b) {
>  a <-D$X
>  b<-D$Y
>  if (a<0 && b<0 && b {phase=1} else if (a<0 && b<0 && b>a)
> {phase=2} else if (a<0 && b>0 && b {phase=7} else if (a<0 && b>0 && b>a)
> {phase=8} else if (a>0 && b<0 && b {phase=3} else if (a>0 && b<0 && b>a)
> {phase=4} else if (a>0 && b>0 && b>a)
> {phase=6} else (a>0 && b>0 && b {phase=5}
> })
>
> Can anyone help to fix the script to get a Phase column based on the
> conditions. The table will be like the below
>YEAR   DAY  X Y   Sig  Phase
>  1  1981 9 -0.213 1.08   1.10   phase=7
>  2  198110  0.065 1.05   1.05   phase=6
>
> Many thanks
> Alea
>
> [[alternative HTML version deleted]]
>
> __
> R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide
> http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Query about calculating the monthly average of daily data columns

2019-10-20 Thread jim holtman
Does this do what you want:

> library(tidyverse)

> input <- read_delim("PERMNO DATE Spread
+ 111 19940103 0.025464308
+ 111 19940104 0.064424296
+ 111 19940105 0.018579337
+ 111 19940106 0.018872211
 ..." ... [TRUNCATED]

> # drop last two digits to get the month
> monthly <- input %>%
+   group_by(PERMNO, month = DATE %/% 100) %>%
+   summarise(avg = mean(Spread))
> monthly
# A tibble: 12 x 3
# Groups:   PERMNO [3]
   PERMNO  month  avg
  
 1111 199401 0.0416
 2111 199402 0.0508
 3111 199403 0.0567
 4111 199404 0.0466
 5112 199401 0.000533
 6112 199402 0.000593
 7112 199403 0.000471
 8112 199404 0.000587
 9113 199401 0.000692
10113 199402 0.000591
11113 199403 0.000677
12    113 199404 0.000555
>


Jim Holtman
*Data Munger Guru*


*What is the problem that you are trying to solve?Tell me what you want to
do, not how you want to do it.*


On Sun, Oct 20, 2019 at 5:10 AM Subhamitra Patra 
wrote:

> Dear Sir,
>
> Thank you very much for your suggestions.
>
> Due to certain inconveniences, I was unable to work on your suggestions.
>
> Today I worked on both suggestions and got the result that I really wanted
> that monthly averages for each country.
>
> Here, I am asking one more query (just for learning purpose) that if my
> country name and its respective variable is in the panel format, and I want
> to take the monthly average for each country, how the code will be
> arranged. For your convenience, I am providing a small data sample below.
>
> PERMNO DATE Spread
> 111 19940103 0.025464308
> 111 19940104 0.064424296
> 111 19940105 0.018579337
> 111 19940106 0.018872211
> 111 19940107 0.065279782
> 111 19940110 0.063485905
> 111 19940111 0.018355453
> 111 19940112 0.064135683
> 111 19940113 0.063519987
> 111 19940114 0.018277351
> 111 19940117 0.018628417
> 111 19940118 0.065630229
> 111 19940119 0.018713152
> 111 19940120 0.019119037
> 111 19940121 0.068342043
> 111 19940124 0.020843244
> 111 19940125 0.019954211
> 111 19940126 0.018980321
> 111 19940127 0.066827165
> 111 19940128 0.067459235
> 111 19940131 0.068682559
> 111 19940201 0.02081465
> 111 19940202 0.068236091
> 111 19940203 0.068821406
> 111 19940204 0.020075648
> 111 19940207 0.066070584
> 111 19940208 0.066068837
> 111 19940209 0.019077072
> 111 19940210 0.065894875
> 111 19940211 0.018847478
> 111 19940214 0.065040844
> 111 19940215 0.01880332
> 111 19940216 0.018836199
> 111 19940217 0.06665
> 111 19940218 0.067116793
> 111 19940221 0.068809742
> 111 19940222 0.068230213
> 111 19940223 0.069502855
> 111 19940224 0.070383523
> 111 19940225 0.020430811
> 111 19940228 0.067087257
> 111 19940301 0.066776479
> 111 19940302 0.019959031
> 111 19940303 0.066596469
> 111 19940304 0.019131334
> 111 19940307 0.019312528
> 111 19940308 0.067349909
> 111 19940309 0.068916431
> 111 19940310 0.068620043
> 111 19940311 0.070494844
> 111 19940314 0.071056842
> 111 19940315 0.071042517
> 111 19940316 0.072401771
> 111 19940317 0.071940001
> 111 19940318 0.07352884
> 111 19940321 0.072671688
> 111 19940322 0.072652595
> 111 19940323 0.021352138
> 111 19940324 0.069933727
> 111 19940325 0.068717467
> 111 19940328 0.020470748
> 111 19940329 0.020003748
> 111 19940330 0.065833717
> 111 19940331 0.065268388
> 111 19940401 0.018762356
> 111 19940404 0.064914179
> 111 19940405 0.064706743
> 111 19940406 0.018764175
> 111 19940407 0.06524806
> 111 19940408 0.018593449
> 111 19940411 0.064913949
> 111 19940412 0.01872089
> 111 19940413 0.018729328
> 111 19940414 0.018978773
> 111 19940415 0.065477137
> 111 19940418 0.064614365
> 111 19940419 0.064184148
> 111 19940420 0.018553192
> 111 19940421 0.066872771
> 111 19940422 0.06680782
> 111 19940425 0.067467961
> 111 19940426 0.02014297
> 111 19940427 0.062464016
> 111 19940428 0.062357052
> 112 19940429 0.000233993
> 112 19940103 0.000815264
> 112 19940104 0.000238165
> 112 19940105 0.000813632
> 112 19940106 0.000236915
> 112 19940107 0.000809102
> 112 19940110 0.000801642
> 112 19940111 0.000797932
> 112 19940112 0.000795251
> 112 19940113 0.000795186
> 112 19940114 0.000231359
> 112 19940117 0.000232134
> 112 19940118 0.000233718
> 112 19940119 0.000233993
> 112 19940120 0.000234694
> 112 19940121 0.000235753
> 112 19940124 0.000808653
> 112 19940125 0.000235604
> 112 19940126 0.000805068
> 112 19940127 0.000802337
> 112 19940128 0.000801768
> 112 19940131 0.000233517
> 112 19940201 0.000797431
> 112 19940202 0.00028
> 112 19940203 0.000233826
> 112 19940204 0.000799519
> 112 19940207 0.000798105
> 112 19940208 0.000792245
> 112 199402

Re: [R] Installing multiple packages fails

2019-08-09 Thread jim holtman
The first parameter needs to be a character vector:

 install.packages(c("Blossom","INLA","RTisean","RcppProgress","STRbook",
"askpass","classInt","ellipsis","generics","lpSolve","
odesolve","ranger","sf",
"sys","units") )

Jim Holtman
*Data Munger Guru*


*What is the problem that you are trying to solve?Tell me what you want to
do, not how you want to do it.*


On Fri, Aug 9, 2019 at 9:16 AM Rich Shepard 
wrote:

> Running 3.6.1 here and migrating from my old 32-bit server/workstation to a
> new 64-bit server/workstation (both running fully patched Slackware-14.2).
>
> On the old host .libPaths() returns "/usr/lib/R/library"; on the new host
> it
> returned nothing so I ran .libPaths("/usr/lib64/R/library") to create the
> system-wide library.
>
> First question is how I was able to print a dataframe list of installed
> libraries on the new host if the path was not defined?
>
> Second question is why I get an error on the new host after defining the
> library and running the install.packages() function:
>
> > install.packages("Blossom","INLA","RTisean","RcppProgress","STRbook",
>
> "askpass","classInt","ellipsis","generics","lpSolve","odesolve","ranger","sf",
> "sys","units")
>
> Warning in install.packages("Blossom", "INLA", "RTisean", "RcppProgress", :
>'lib = "INLA"' is not writable
>
> What have I missed or done incorrectly?
>
> TIA,
>
> Rich
>
> __
> R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide
> http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Structuring data for Correspondence Analysis

2019-03-29 Thread jim holtman
I am not familiar with SAS, so what did you want your output to look like.
There is the 'table' function that might do the job and then there is
always 'dplyr' which can do the hard stuff.  So we need more information on
what you want.

Jim Holtman
*Data Munger Guru*


*What is the problem that you are trying to solve?Tell me what you want to
do, not how you want to do it.*


On Fri, Mar 29, 2019 at 6:35 AM Alfredo 
wrote:

> Hi, I am very new to r and need help from you to do a correspondence
> analysis because I don't know how to structure the following data:
>
> Thank you.
>
> Alfredo
>
>
>
> library(ca,lib.loc=folder)
>
> table <- read.csv(file="C:\\Temp\\Survey_Data.csv", header=TRUE, sep=",")
>
> head (table, n=20)
>
> Preference   SexAge   Time
>
> 1   News/Info/Talk M  25-30  06-09
>
> 2Classical F  >3509-12
>
> 3  Rock and Top 40 F  21-25  12-13
>
> 4 Jazz M  >3513-16
>
> 5   News/Info/Talk F  25-30  16-18
>
> 6 Don't listen F  30-35  18-20
>
> ...
>
> 19 Rock and Top 40 M  25-30  16-18
>
> 20  Easy Listening F  >3518-20
>
>
>
> In SAS I would simply do this:
>
> proc corresp data=table dim=2 outc=_coord;
>
>table Preference, Sex Age Time;
>
> run;
>
>
>
> I don't know how convert in R a data frame to a frequency table to execute
> properly this function:
>
> ca <- ca(, graph=FALSE)
>
>
> [[alternative HTML version deleted]]
>
> __
> R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide
> http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] aggregate output to data frame

2019-03-29 Thread jim holtman
You can also use 'dplyr'

library(tidyverse)
result <- pcr %>%
  group_by(Gene, Type, Rep) %>%
  summarise(mean = mean(Ct),
   sd = sd(Ct),
   oth = sd(Ct) / sqrt(sd(Ct))
  )

Jim Holtman
*Data Munger Guru*


*What is the problem that you are trying to solve?Tell me what you want to
do, not how you want to do it.*


On Wed, Mar 27, 2019 at 7:40 PM Jim Lemon  wrote:

> Hi Cyrus,
> Try this:
>
> pcr<-data.frame(Ct=runif(66,10,20),Gene=rep(LETTERS[1:22],3),
>  Type=rep(c("Std","Unkn"),33),Rep=rep(1:3,each=22))
> testagg<-aggregate(pcr$Ct,c(pcr["Gene"],pcr["Type"],pcr["Rep"]),
>  FUN=function(x){c(mean(x), sd(x), sd(x)/sqrt(sd(x)))})
> nxcol<-dim(testagg$x)[2]
> newxs<-paste("x",1:nxcol,sep="")
> for(col in 1:nxcol)
>  testagg[[newxs[col]]]<-testagg$x[,col]
> testagg$x<-NULL
>
> Jim
>
> On Thu, Mar 28, 2019 at 12:39 PM cir p via R-help 
> wrote:
> >
> > Dear users,
> > i am trying to summarize data using "aggregate" with the following
> command:
> >
> >
> aggregate(pcr$Ct,c(pcr["Gene"],pcr["Type"],pcr["Rep"]),FUN=function(x){c(mean(x),
> sd(x), sd(x)/sqrt(sd(x)))})
> >
> > and the structure of the resulting data frame is
> >
> > 'data.frame':66 obs. of  4 variables:
> > $ Gene: Factor w/ 22 levels "14-3-3e","Act5C",..: 1 2 3 4 5 6 7 8 9 10
> ...
> > $ Type: Factor w/ 2 levels "Std","Unkn": 2 2 2 2 2 2 2 2 2 2 ...
> > $ Rep : int  1 1 1 1 1 1 1 1 1 1 ...
> >  $ x   : num [1:66, 1:3] 16.3 16.7 18.2 17.1 18.6 ...
> >
> > The actual data is "bundled" in a matrix $x of the data frame. I would
> like to have the columns of this matrix as individual numeric columns in
> the same data frame instead of a matrix, but cant really figure it out how
> to do this in an efficient way. Could someone help me with the construction
> of this?
> >
> > Thanks a lot,
> >
> > Cyrus
> >
> > __
> > R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
> > https://stat.ethz.ch/mailman/listinfo/r-help
> > PLEASE do read the posting guide
> http://www.R-project.org/posting-guide.html
> > and provide commented, minimal, self-contained, reproducible code.
>
> __
> R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide
> http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] loop through columns in a data frame

2019-03-25 Thread jim holtman
R Notebook

You forgot to provide what your test data looks like. For example, are all
the columns a single letter followed by “_" as the name, or are there
longer names? Are there always matched pairs (‘le’ and ‘me’) or can singles
occur?
Hide

library(tidyverse)# create some data
test <- tibble(a_le = sample(3, 10, TRUE),
   a_me = sample(3, 10, TRUE),
   b_le = sample(3, 10, TRUE),
   b_me = sample(3, 10, TRUE),
   long_le = sample(3, 10, TRUE),
   long_me = sample(3, 10, TRUE),
   short_le = sample(3, 10, TRUE)
)

So get the names of the columns that contain ‘le’ or ‘me’ and group them
together for processing
Hide

col_names <- grep("_(le|me)$", names(test), value = TRUE)
group <- tibble(id = str_remove(col_names, "_.*"), col = col_names)
result <- group %>%
  group_by(id) %>%
  do(tibble(x = rowSums(test[, .$col] == 1)))# add new columns backfor
(i in split(result, result$id)){
  test[, paste0(i$id[1], "_new")] <- as.integer(i$x > 0)
}
test

a_le

a_me

b_le

b_me

long_le

long_me

short_le

a_new

b_new

long_new

3 1 2 3 1 2 2 1 0 1
2 3 3 2 1 1 1 0 0 1
3 2 3 2 1 3 3 0 0 1
2 3 1 3 3 1 2 0 1 1
1 1 2 1 1 2 2 1 1 1
3 3 3 1 1 1 1 0 1 1
1 2 1 2 2 2 2 1 1 0
1 3 2 3 1 1 3 1 0 1
3 1 1 1 3 3 2 1 1 0
1 1 1 2 3 3 3 1 1 0
1-10 of 10 rows | 1-10 of 11 columns

Jim Holtman
*Data Munger Guru*


*What is the problem that you are trying to solve?Tell me what you want to
do, not how you want to do it.*


On Mon, Mar 25, 2019 at 10:08 AM Yuan, Keming (CDC/DDNID/NCIPC/DVP) via
R-help  wrote:

> Hi All,
>
> I have a data frame with variable names like A_le, A_me, B_le, B_me, C_le,
> C_me
> if A_le=1 or A_me=1 then  I need to create a new column A_new=1. Same
> operation to create columns B_new, C_new...
> Does anyone know how to use loop (or other methods) to create new columns?
> In SAS, I can use array to get it done. But I don't know how to do it in R.
>
> Thanks,
>
> Keming Yuan
> CDC
>
>
> [[alternative HTML version deleted]]
>
> __
> R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide
> http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Purr and Basic Functional Programming Tasks

2019-01-25 Thread jim holtman
Try this for the second question:

> years <- map2(zz,
+   list(c(2000, 2001), c(2001, 2003)),
+   ~ filter(.x, year %in% .y)
+ )
> years
[[1]]
# A tibble: 6 x 4
   year tot_i relation   g_rate
   
1  2000 22393349. EU28-Algeria   0.736
2  2001 23000574. EU28-Algeria   0.0271
3  2000 34361300. World-Algeria  0.615
4  2001 35297815. World-Algeria  0.0273
5  2000 11967951. Extra EU28-Algeria 0.428
6  2001 12297241. Extra EU28-Algeria 0.0275

[[2]]
# A tibble: 6 x 4
   year tot_i relation  g_rate
  
1  2001  7869288. EU28-Egypt   -0.148
2  2003  6395999. EU28-Egypt   -0.120
3  2001 19851236. World-Egypt  -0.0721
4  2003 16055014. World-Egypt  -0.175
5  2001 11981948. Extra EU28-Egypt -0.0147
6  2003  9659015. Extra EU28-Egypt -0.207

>

Jim Holtman
*Data Munger Guru*


*What is the problem that you are trying to solve?Tell me what you want to
do, not how you want to do it.*


On Fri, Jan 25, 2019 at 5:45 AM Lorenzo Isella 
wrote:

> Dear All,
> I am making my baby steps with the tidyverse purr package and I am
> stuck with some probably trivial tasks.
> Consider the following data set
>
>
> zz<-list(structure(list(year = c(2000, 2001, 2002, 2003, 2000, 2001,
> 2002, 2003, 2000, 2001, 2002, 2003), tot_i = c(22393349.081,
> 23000574.372, 21682040.898, 21671102.853, 34361300.338, 35297814.942,
> 34745691.204, 35878883.117, 11967951.257, 12297240.57, 13063650.306,
> 14207780.264), relation = c("EU28-Algeria", "EU28-Algeria",
> "EU28-Algeria",
> "EU28-Algeria", "World-Algeria", "World-Algeria", "World-Algeria",
> "World-Algeria", "Extra EU28-Algeria", "Extra EU28-Algeria",
> "Extra EU28-Algeria", "Extra EU28-Algeria"), g_rate = c(0.736046372770467,
> 0.0271163231905857, -0.0573261107603093, -0.000504474880914325,
> 0.614846575418334, 0.0272549232650638, -0.0156418673197543,
> 0.0326138831530727,
> 0.428272657063707, 0.0275142592018328, 0.0623237165799383,
> 0.0875811837579971
> )), row.names = c(NA, -12L), class = c("tbl_df", "tbl", "data.frame"
> )), structure(list(year = c(2000, 2001, 2002, 2003, 2000, 2001,
> 2002, 2003, 2000, 2001, 2002, 2003), tot_i = c(9233346.648, 7869288.171,
> 7271485.687, 6395999.102, 21393949.287, 19851236.26, 19449339.887,
> 16055014.309, 12160602.639, 11981948.089, 12177854.2, 9659015.207
> ), relation = c("EU28-Egypt", "EU28-Egypt", "EU28-Egypt", "EU28-Egypt",
> "World-Egypt", "World-Egypt", "World-Egypt", "World-Egypt", "Extra
> EU28-Egypt",
> "Extra EU28-Egypt", "Extra EU28-Egypt", "Extra EU28-Egypt"),
> g_rate = c(0.0970653722744164, -0.147731751985664, -0.0759665259436081,
> -0.120399959882366, 0.124744629514854, -0.0721097823643728,
> -0.0202454077789513, -0.174521376957825, 0.146712116047648,
> -0.0146912579338002, 0.0163501051368976, -0.206837670383671
> )), row.names = c(NA, -12L), class = c("tbl_df", "tbl", "data.frame"
> )))
>
> I am capable of doing very simple stuff with maps for instance taking the
> iteratively the mean of a certain column
>
> map(zz, function(x) mean(x$tot_i))
>
> or filtering the values of the years
>
> map(zz, function(x) filter(x, year==2000))
>
> however, I bang my head against the wall as soon as I want to add a bit of
> complexity. For instance
>
> 1)I want to iteratively group the data in zz by relation and summarise
> them by taking the average of tot_i and
>
> 2)Given a list of years
>
> ll<-list(c(2000, 2001), c(2001, 2003))
>
> I would like to filter the two elements of the zz list according to the
> years listed in ll.
>
> I would then have plenty of other operations to carry out on the data, but
> already understanding 1 and 2 would take me a long way from where I am
> stuck now.
>
> Any suggestion is welcome.
> Cheers
>
> Lorenzo
>
> __
> R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide
> http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Purr and Basic Functional Programming Tasks

2019-01-25 Thread jim holtman
Does this answer the first question?

> rel <- map(zz, function(x){
+   group_by(x, relation) %>% summarise(tot = mean(tot_i))
+ })
> rel
[[1]]
# A tibble: 3 x 2
  relation tot

1 EU28-Algeria   22186767.
2 Extra EU28-Algeria 12884156.
3 World-Algeria  35070922.

[[2]]
# A tibble: 3 x 2
  relation   tot
  
1 EU28-Egypt7692530.
2 Extra EU28-Egypt 11494855.
3 World-Egypt  19187385.

>

Jim Holtman
*Data Munger Guru*


*What is the problem that you are trying to solve?Tell me what you want to
do, not how you want to do it.*


On Fri, Jan 25, 2019 at 5:45 AM Lorenzo Isella 
wrote:

> Dear All,
> I am making my baby steps with the tidyverse purr package and I am
> stuck with some probably trivial tasks.
> Consider the following data set
>
>
> zz<-list(structure(list(year = c(2000, 2001, 2002, 2003, 2000, 2001,
> 2002, 2003, 2000, 2001, 2002, 2003), tot_i = c(22393349.081,
> 23000574.372, 21682040.898, 21671102.853, 34361300.338, 35297814.942,
> 34745691.204, 35878883.117, 11967951.257, 12297240.57, 13063650.306,
> 14207780.264), relation = c("EU28-Algeria", "EU28-Algeria",
> "EU28-Algeria",
> "EU28-Algeria", "World-Algeria", "World-Algeria", "World-Algeria",
> "World-Algeria", "Extra EU28-Algeria", "Extra EU28-Algeria",
> "Extra EU28-Algeria", "Extra EU28-Algeria"), g_rate = c(0.736046372770467,
> 0.0271163231905857, -0.0573261107603093, -0.000504474880914325,
> 0.614846575418334, 0.0272549232650638, -0.0156418673197543,
> 0.0326138831530727,
> 0.428272657063707, 0.0275142592018328, 0.0623237165799383,
> 0.0875811837579971
> )), row.names = c(NA, -12L), class = c("tbl_df", "tbl", "data.frame"
> )), structure(list(year = c(2000, 2001, 2002, 2003, 2000, 2001,
> 2002, 2003, 2000, 2001, 2002, 2003), tot_i = c(9233346.648, 7869288.171,
> 7271485.687, 6395999.102, 21393949.287, 19851236.26, 19449339.887,
> 16055014.309, 12160602.639, 11981948.089, 12177854.2, 9659015.207
> ), relation = c("EU28-Egypt", "EU28-Egypt", "EU28-Egypt", "EU28-Egypt",
> "World-Egypt", "World-Egypt", "World-Egypt", "World-Egypt", "Extra
> EU28-Egypt",
> "Extra EU28-Egypt", "Extra EU28-Egypt", "Extra EU28-Egypt"),
> g_rate = c(0.0970653722744164, -0.147731751985664, -0.0759665259436081,
> -0.120399959882366, 0.124744629514854, -0.0721097823643728,
> -0.0202454077789513, -0.174521376957825, 0.146712116047648,
> -0.0146912579338002, 0.0163501051368976, -0.206837670383671
> )), row.names = c(NA, -12L), class = c("tbl_df", "tbl", "data.frame"
> )))
>
> I am capable of doing very simple stuff with maps for instance taking the
> iteratively the mean of a certain column
>
> map(zz, function(x) mean(x$tot_i))
>
> or filtering the values of the years
>
> map(zz, function(x) filter(x, year==2000))
>
> however, I bang my head against the wall as soon as I want to add a bit of
> complexity. For instance
>
> 1)I want to iteratively group the data in zz by relation and summarise
> them by taking the average of tot_i and
>
> 2)Given a list of years
>
> ll<-list(c(2000, 2001), c(2001, 2003))
>
> I would like to filter the two elements of the zz list according to the
> years listed in ll.
>
> I would then have plenty of other operations to carry out on the data, but
> already understanding 1 and 2 would take me a long way from where I am
> stuck now.
>
> Any suggestion is welcome.
> Cheers
>
> Lorenzo
>
> __
> R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide
> http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Using apply

2018-10-30 Thread jim holtman
> s2 <- apply(x*x, 2, sum)
> s2
[1]  55 330

Jim Holtman
Data Munger Guru

What is the problem that you are trying to solve?
Tell me what you want to do, not how you want to do it.
On Tue, Oct 30, 2018 at 10:28 PM Steven Yen  wrote:
>
> I need help with "apply". Below, I have no problem getting the column sums.
> 1. How do I get the sum of squares?
> 2. In general, where do I look up these functions?
> Thanks.
>
> x<-matrix(1:10,nrow=5); x
> sum <- apply(x,2,sum); sum
>
>
>
>
> [[alternative HTML version deleted]]
>
> __
> R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] date and time data on x axis

2018-10-28 Thread jim holtman
You need to specify what the format of the date will be.  I am using
ggplot for the plot:


library(lubridate)
library(tidyverse)
mydata <- read.table(text = "time value
20181028_10:00:00 600
20181028_10:00:01 500
20181028_10:00:02 450
20181028_10:00:03 660", header = TRUE, as.is = TRUE)

mydata <- mutate(mydata,
 time = ymd_hms(time)
)

ggplot(mydata, aes(time, value)) +
  geom_point() +
  scale_x_datetime(date_labels = "%m/%d %H:%M:%S"
  ) +
  theme(axis.text.x = element_text(angle = 25, vjust = 1.0, hjust = 1.0))

Jim Holtman
Data Munger Guru

What is the problem that you are trying to solve?
Tell me what you want to do, not how you want to do it.

Jim Holtman
Data Munger Guru

What is the problem that you are trying to solve?
Tell me what you want to do, not how you want to do it.


On Sun, Oct 28, 2018 at 11:23 AM snowball0916  wrote:
>
> Hi, guys
> How do you guys deal with the date and time data on x axis?
> I have some trouble with it. Could you help with this?
>
> =
> Sample Data
> =
> The sample data look like this:
>
> 20181028_10:00:00 600
> 20181028_10:00:01 500
> 20181028_10:00:02 450
> 20181028_10:00:03 660
> ..
>
> =
> My Code
> =
>
> library(lubridate)
> mydata <- read.table("e:/R_study/graph_test2.txt")
> xdata <- ymd_hms(mydata$V1)
> ydata <- mydata$V2
> plot(xdata, ydata, type="o")
>
>
> =
> Questions:
> =
>
> 1. Why my x axis does not show me the correct date time like ""2018-10-28 
> 10:00:00 UTC" ?
> 2. If my data is very huge(like data in every second and the data has the 
> whole day , even the whole month), how can I display the x axis in a neat and 
> clean way?
>
> Thanks very much.
>
>
>
>
> [[alternative HTML version deleted]]
>
> __
> R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Need to understand how to troubleshoot below error

2018-10-20 Thread jim holtman
Can you show the code that was being executed at the time.  Have you
verified that the path to the file is correct for the directory that you
are using?  Have you validated that you have the correct permissions in the
directory to create the file?  Show the complete path length that you were
using and then follow that path to make sure that there is a directory
there that you can write into.

Jim Holtman
*Data Munger Guru*


*What is the problem that you are trying to solve?Tell me what you want to
do, not how you want to do it.*


On Sat, Oct 20, 2018 at 1:19 PM MEENA SUBRAMANIAN via R-help <
r-help@r-project.org> wrote:

> Hi
> Im unable to write or save my R studio files
> Below error is thrown when the same code works for others
> Error in file(file, ifelse(append, "a", "w")) :   cannot open the
> connectionIn addition: Warning message:In file(file, ifelse(append, "a",
> "w")) :  cannot open file 'housedatacomplete.csv': No such file or directory
>
> Meena
> [[alternative HTML version deleted]]
>
> __
> R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide
> http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] read txt file - date - no space

2018-08-01 Thread jim holtman
Try this:

> library(lubridate)
> library(tidyverse)
> input <- read.csv(text = "date,str1,str2,str3
+ 10/1/1998 0:00,0.6,0,0
+   10/1/1998 1:00,0.2,0.2,0.2
+   10/1/1998 2:00,0.6,0.2,0.4
+   10/1/1998 3:00,0,0,0.6
+   10/1/1998 4:00,0,0,0
+   10/1/1998 5:00,0,0,0
+   10/1/1998 6:00,0,0,0
+   10/1/1998 7:00,0.2,0,0", as.is = TRUE)
> # convert the date and add the "day" so summarize
> input <- input %>%
+   mutate(date = mdy_hm(date),
+  day = floor_date(date, unit = 'day')
+   )
>
> by_day <- input %>%
+   group_by(day) %>%
+   summarise(m_s1 = mean(str1),
+ m_s2 = mean(str2),
+ m_s3 = mean(str3)
+   )
>
> by_day
# A tibble: 1 x 4
  day  m_s1   m_s2  m_s3
       
1 1998-10-01 00:00:00 0.200 0.0500 0.150

Jim Holtman
*Data Munger Guru*


*What is the problem that you are trying to solve?Tell me what you want to
do, not how you want to do it.*


On Tue, Jul 31, 2018 at 11:54 PM Diego Avesani 
wrote:

> Dear all,
> I am sorry, I did a lot of confusion. I am sorry, I have to relax and stat
> all again in order to understand.
> If I could I would like to start again, without mixing strategy and waiting
> for your advice.
>
> I am really appreciate you help, really really.
> Here my new file, a *.csv file (buy the way, it is possible to attach it in
> the mailing list?)
>
> date,str1,str2,str3
> 10/1/1998 0:00,0.6,0,0
> 10/1/1998 1:00,0.2,0.2,0.2
> 10/1/1998 2:00,0.6,0.2,0.4
> 10/1/1998 3:00,0,0,0.6
> 10/1/1998 4:00,0,0,0
> 10/1/1998 5:00,0,0,0
> 10/1/1998 6:00,0,0,0
> 10/1/1998 7:00,0.2,0,0
>
>
> I read it as:
> MyData <- read.csv(file="obs_prec.csv",header=TRUE, sep=",")
>
> at this point I would like to have the daily mean.
> What would you suggest?
>
> Really Really thanks,
> You are my lifesaver
>
> Thanks
>
>
>
> Diego
>
>
> On 1 August 2018 at 01:01, Jeff Newmiller 
> wrote:
>
> > ... and the most common source of NA values in time data is wrong
> > timezones. You really need to make sure the timezone that is assumed when
> > the character data are converted to POSIXt agrees with the data. In most
> > cases the easiest way to insure this is to use
> >
> > Sys.setenv(TZ="US/Pacific")
> >
> > or whatever timezone from
> >
> > OlsonNames()
> >
> > corresponds with your data. Execute this setenv function before the
> > strptime or as.POSIXct() function call.
> >
> > You can use
> >
> > MyData[ is.na(MyData$datetime), ]
> >
> > to see which records are failing to convert time.
> >
> > [1] https://github.com/jdnewmil/eci298sp2016/blob/master/QuickHowto1
> >
> > On July 31, 2018 3:04:05 PM PDT, Jim Lemon  wrote:
> > >Hi Diego,
> > >I think the error is due to NA values in your data file. If I extend
> > >your example and run it, I get no errors:
> > >
> > >MyData<-read.table(text="103001930 103001580 103001530
> > >1998-10-01 00:00:00 0.6 0 0
> > >1998-10-01 01:00:00 0.2 0.2 0.2
> > >1998-10-01 02:00:00 0.6 0.2 0.4
> > >1998-10-01 03:00:00 0 0 0.6
> > >1998-10-01 04:00:00 0 0 0
> > >1998-10-01 05:00:00 0 0 0
> > >1998-10-01 06:00:00 0 0 0
> > >1998-10-01 07:00:00 0.2 0 0
> > >1998-10-01 08:00:00 0.6 0 0
> > >1998-10-01 09:00:00 0.2 0.2 0.2
> > >1998-10-01 10:00:00 0.6 0.2 0.4
> > >1998-10-01 11:00:00 0 0 0.6
> > >1998-10-01 12:00:00 0 0 0
> > >1998-10-01 13:00:00 0 0 0
> > >1998-10-01 14:00:00 0 0 0
> > >1998-10-01 15:00:00 0.2 0 0
> > >1998-10-01 16:00:00 0.6 0 0
> > >1998-10-01 17:00:00 0.2 0.2 0.2
> > >1998-10-01 18:00:00 0.6 0.2 0.4
> > >1998-10-01 19:00:00 0 0 0.6
> > >1998-10-01 20:00:00 0 0 0
> > >1998-10-01 21:00:00 0 0 0
> > >1998-10-01 22:00:00 0 0 0
> > >1998-10-01 23:00:00 0.2 0 0
> > >1998-10-02 00:00:00 0.6 0 0
> > >1998-10-02 01:00:00 0.2 0.2 0.2
> > >1998-10-02 02:00:00 0.6 0.2 0.4
> > >1998-10-02 03:00:00 0 0 0.6
> > >1998-10-02 04:00:00 0 0 0
> > >1998-10-02 05:00:00 0 0 0
> > >1998-10-02 06:00:00 0 0 0
> > >1998-10-02 07:00:00 0.2 0 0
> > >1998-10-02 08:00:00 0.6 0 0
> > >1998-10-02 09:00:00 0.2 0.2 0.2
> > >1998-10-02 10:00:00 0.6 0.2 0.4
> > >1998-10-02 11:00:00 0 0 0.6
> > >1998-10-02 12:00:00 0 0 0
> > >1998-10-02 13:00:00 0 0 0
> > >1998-10-02 14:00:00 0 0 0
> > &

Re: [R] values of list of variable names

2018-06-01 Thread jim holtman
You probably want to use 'get':

> r1 <- 5
> r2 <- 3
> r3 <- 45
> x <- ls(pattern = '^r.$')
> x
[1] "r1" "r2" "r3"
> lapply(x, get)
[[1]]
[1] 5

[[2]]
[1] 3

[[3]]
[1] 45

>


Jim Holtman
Data Munger Guru

What is the problem that you are trying to solve?
Tell me what you want to do, not how you want to do it.

On Fri, Jun 1, 2018 at 7:25 AM, Christian  wrote:

> Hi,
>
> I have searched the documentations of eval, substitute, expression, and I
> cannot make work something like the values of a list of variable names:
>
> lis <- ls(pattern="pr") # all variables with names containing 'pr'
>
> What is the mantra giving me the _values_ of the variables whose names
> are  contained in 'lis'. eval(parse(ls(pattern="pr"))) will not do but
> returning TRUE.
>
> TIA
> C.
> --
> Christian Hoffmann
> Rigiblickstrasse 15b
> CH-8915 Hausen am Albis
> Switzerland
> Telefon +41-(0)44-7640853
>
> __
> R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posti
> ng-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Convert daily data to weekly data

2018-05-29 Thread jim holtman
Forgot the year if you also want to summarise by that.


> x <- structure(list(X1986.01.01.10.30.00 = c(16.8181762695312,
16.8181762695312,
+  18.8294372558594, 16 
[TRUNCATED]

> library(tidyverse)

> library(lubridate)

> # convert to long form
> x_long <- gather(x, key = 'date', value = "value", -ID)

> # change the date to POSIXct
> x_long$date <- ymd_hms(substring(x_long$date, 2, 19))

> # add the week of the year
> x_long$week <- week(x_long$date)

> x_long$year <- year(x_long$date)

> # average by ID/week
> avg <- x_long %>%
+   group_by(ID, year, week) %>%
+   summarise(avg = mean(value))
> avg
# A tibble: 6 x 4
# Groups:   ID, year [?]
 ID  year  week   avg
 
1 1 1986.1.  16.0
2 2 1986.1.  16.0
3 3 1986.1.  17.9
4 4 1986.1.  16.0
5 5 1986.1.  17.9
6 6 1986.1.  16.0



Jim Holtman
Data Munger Guru

What is the problem that you are trying to solve?
Tell me what you want to do, not how you want to do it.

On Tue, May 29, 2018 at 7:02 AM, Miluji Sb  wrote:

> Dear Petr,
>
> Thanks for your reply and the solution. The example dataset contains data
> for the first six days of the year 1986. "X1986.01.01.10.30.00" is 01
> January 1986 and the rest of the variable is redundant information. The
> last date is given as "X2016.12.31.10.30.00".
>
> I realized that I missed one information in my previous email, I would like
> to compute the weekly average by the variable ID. Thanks again!
>
> Sincerely,
>
> Shouro
>
> On Tue, May 29, 2018 at 3:24 PM, PIKAL Petr 
> wrote:
>
> > Hi
> >
> > Based on your explanation I would advice to use
> >
> > ?cut.POSIXt
> >
> > with breaks "week". However your data are rather strange, you have data
> > frame with names which looks like dates
> >
> > names(test)
> > [1] "X1986.01.01.10.30.00" "X1986.01.02.10.30.00" "X1986.01.03.10.30.00"
> > [4] "X1986.01.04.10.30.00" "X1986.01.05.10.30.00" "X1986.01.06.10.30.00"
> > [7] "ID"
> >
> > and under each name you have 6 numeric values
> > test[,1]
> > [1] 16.81818 16.81818 18.82944 16.81818 18.82944 16.83569
> >
> > You (probably) can get dates by
> > as.Date(substring(names(test),2,11), format="%Y.%m.%d")
> > [1] "1986-01-01" "1986-01-02" "1986-01-03" "1986-01-04" "1986-01-05"
> > [6] "1986-01-06" NA
> >
> > but if you want just average those 6 values below each date you could do
> >
> > colMeans(test)
> >
> > and/or bind it together.
> >
> > > ddd<-as.Date(substring(names(test),2,11), format="%Y.%m.%d")
> > > data.frame(ddd, aver=colMeans(test))
> > ddd aver
> > X1986.01.01.10.30.00 1986-01-01 17.49152
> > X1986.01.02.10.30.00 1986-01-02 16.84200
> > X1986.01.03.10.30.00 1986-01-03 16.51526
> > X1986.01.04.10.30.00 1986-01-04 16.90191
> > X1986.01.05.10.30.00 1986-01-05 16.00480
> > X1986.01.06.10.30.00 1986-01-06 16.04405
> > ID   3.5
> >
> > Cheers
> > Petr
> >
> > Tento e-mail a jakékoliv k němu připojené dokumenty jsou důvěrné a
> > podléhají tomuto právně závaznému prohlášení o vyloučení odpovědnosti:
> > https://www.precheza.cz/01-dovetek/ | This email and any documents
> > attached to it may be confidential and are subject to the legally binding
> > disclaimer: https://www.precheza.cz/en/01-disclaimer/
> >
> > > -Original Message-
> > > From: R-help [mailto:r-help-boun...@r-project.org] On Behalf Of Miluji
> > Sb
> > > Sent: Tuesday, May 29, 2018 2:59 PM
> > > To: r-help mailing list 
> > > Subject: [R] Convert daily data to weekly data
> > >
> > > Dear all,
> > >
> > > I have daily data in wide-format from 01/01/1986 to 31/12/2016 by ID. I
> > would
> > > like to convert this to weekly average data. The data has been
> generated
> > by an
> > > algorithm.
> > >
> > > I know that I can use the lubridate package but that would require me
> to
> > first
> > > convert the data to long-form (which is what I want). I am at a bit of
> > loss of
> > > how to extract the date from the variable names and then converting the
> > data
> > > to weekly average. Any help will be high appreciated.
> > >
> > > ### data
> > > structure(list(X1986.01.01.10.30.00 = c(1

Re: [R] Convert daily data to weekly data

2018-05-29 Thread jim holtman
try this:

> x <- structure(list(X1986.01.01.10.30.00 = c(16.8181762695312,
16.8181762695312,
+  18.8294372558594, 16 
[TRUNCATED]

> library(tidyverse)

> library(lubridate)

> # convert to long form
> x_long <- gather(x, key = 'date', value = "value", -ID)

> # change the date to POSIXct
> x_long$date <- ymd_hms(substring(x_long$date, 2, 19))

> # add the week of the year
> x_long$week <- week(x_long$date)

> # average by ID/week
> avg <- x_long %>%
+   group_by(ID, week) %>%
+   summarise(avg = mean(value))
> avg
# A tibble: 6 x 3
# Groups:   ID [?]
 ID  week   avg

1 11.  16.0
2 21.  16.0
3 31.  17.9
4 4    1.  16.0
5 51.  17.9
6 61.  16.0
>


Jim Holtman
Data Munger Guru

What is the problem that you are trying to solve?
Tell me what you want to do, not how you want to do it.

On Tue, May 29, 2018 at 7:02 AM, Miluji Sb  wrote:

> Dear Petr,
>
> Thanks for your reply and the solution. The example dataset contains data
> for the first six days of the year 1986. "X1986.01.01.10.30.00" is 01
> January 1986 and the rest of the variable is redundant information. The
> last date is given as "X2016.12.31.10.30.00".
>
> I realized that I missed one information in my previous email, I would like
> to compute the weekly average by the variable ID. Thanks again!
>
> Sincerely,
>
> Shouro
>
> On Tue, May 29, 2018 at 3:24 PM, PIKAL Petr 
> wrote:
>
> > Hi
> >
> > Based on your explanation I would advice to use
> >
> > ?cut.POSIXt
> >
> > with breaks "week". However your data are rather strange, you have data
> > frame with names which looks like dates
> >
> > names(test)
> > [1] "X1986.01.01.10.30.00" "X1986.01.02.10.30.00" "X1986.01.03.10.30.00"
> > [4] "X1986.01.04.10.30.00" "X1986.01.05.10.30.00" "X1986.01.06.10.30.00"
> > [7] "ID"
> >
> > and under each name you have 6 numeric values
> > test[,1]
> > [1] 16.81818 16.81818 18.82944 16.81818 18.82944 16.83569
> >
> > You (probably) can get dates by
> > as.Date(substring(names(test),2,11), format="%Y.%m.%d")
> > [1] "1986-01-01" "1986-01-02" "1986-01-03" "1986-01-04" "1986-01-05"
> > [6] "1986-01-06" NA
> >
> > but if you want just average those 6 values below each date you could do
> >
> > colMeans(test)
> >
> > and/or bind it together.
> >
> > > ddd<-as.Date(substring(names(test),2,11), format="%Y.%m.%d")
> > > data.frame(ddd, aver=colMeans(test))
> > ddd aver
> > X1986.01.01.10.30.00 1986-01-01 17.49152
> > X1986.01.02.10.30.00 1986-01-02 16.84200
> > X1986.01.03.10.30.00 1986-01-03 16.51526
> > X1986.01.04.10.30.00 1986-01-04 16.90191
> > X1986.01.05.10.30.00 1986-01-05 16.00480
> > X1986.01.06.10.30.00 1986-01-06 16.04405
> > ID   3.5
> >
> > Cheers
> > Petr
> >
> > Tento e-mail a jakékoliv k němu připojené dokumenty jsou důvěrné a
> > podléhají tomuto právně závaznému prohlášení o vyloučení odpovědnosti:
> > https://www.precheza.cz/01-dovetek/ | This email and any documents
> > attached to it may be confidential and are subject to the legally binding
> > disclaimer: https://www.precheza.cz/en/01-disclaimer/
> >
> > > -Original Message-
> > > From: R-help [mailto:r-help-boun...@r-project.org] On Behalf Of Miluji
> > Sb
> > > Sent: Tuesday, May 29, 2018 2:59 PM
> > > To: r-help mailing list 
> > > Subject: [R] Convert daily data to weekly data
> > >
> > > Dear all,
> > >
> > > I have daily data in wide-format from 01/01/1986 to 31/12/2016 by ID. I
> > would
> > > like to convert this to weekly average data. The data has been
> generated
> > by an
> > > algorithm.
> > >
> > > I know that I can use the lubridate package but that would require me
> to
> > first
> > > convert the data to long-form (which is what I want). I am at a bit of
> > loss of
> > > how to extract the date from the variable names and then converting the
> > data
> > > to weekly average. Any help will be high appreciated.
> > >
> > > ### data
> > > structure(list(X1986.01.01.10.30.00 = c(16.8181762695312,
> > > 16.8181762695312, 18.8294372558594, 16.8181762695312,
> > > 18.8294372558594, 16.83569

Re: [R] Split a data.frame

2018-05-19 Thread jim holtman
Forgot to take care of the boundary conditions:

# revised data.frame to take care of boundary conditions
DF = data.frame(name = c('b', 'a','v','z', 'c','d'), val = 0); DF
##   name val
## 1b   0
## 2a   0
## 3v   0
## 4z   0
## 5c   0
## 6d   0
split_str = c('a', 'c')

# If we assume that the values in split_str are ordered in
# the same order as in the dataframe, then this might work.
offsets <- match(split_str, DF$name)

# now find the values inbetween the offsets
ret_indx <- NULL
for (i in seq_len(length(offsets) - 1)){
  if (offsets[i + 1] - offsets[i] > 1){  # something inbetween
ret_indx <- c(ret_indx, (offsets[i] + 1):(offsets[i+1] - 1))
  }
}
DF[ret_indx, ]
##   name val
## 3v   0
## 4    z   0



Jim Holtman
Data Munger Guru

What is the problem that you are trying to solve?
Tell me what you want to do, not how you want to do it.

On Sat, May 19, 2018 at 4:07 AM, Christofer Bogaso <
bogaso.christo...@gmail.com> wrote:

> Hi,
>
> I am struggling to split a data.frame as will below scheme :
>
> DF = data.frame(name = c('a', 'v', 'c'), val = 0); DF
>
> split_str = c('a', 'c')
>
> Now, for each element in split_str, R should find which row of DF contains
> that element, and return DF with all rows starting from next row of the
> corresponding element and ending with the preceding value of the next
> element.
>
> So in my case, I should see 2 data.frames
>
> 1st data-frame with name = 'v' (i.e. 2nd row of DF)
>
> 2nd data.frame with number_of_rows as 0 (as there is no row left after 'c')
>
> Similarly if split_str = c('v'') then, my 2 data.frames will be
>
> 1st data.frame with name = 'a'
> 2nd data.frame with name = 'c'
>
> Any idea how to efficiently implement above scheme would be highly
> appreciated. I tried with split() function, however, it is not giving the
> right answer.
>
> Thanks,
>
> __
> R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/
> posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Split a data.frame

2018-05-19 Thread jim holtman
DF = data.frame(name = c('a', 'v', 'c'), val = 0); DF
##   name val
## 1a   0
## 2v   0
## 3c   0
split_str = c('a', 'c')
# If we assume that the values in split_str are ordered in the same order
as in the dataframe, then this might work.

offsets <- match(split_str, DF$name)
# Since you only want the rows in between

DF[diff(offsets), ]
##   name val
## 2v   0


Jim Holtman
Data Munger Guru

What is the problem that you are trying to solve?
Tell me what you want to do, not how you want to do it.

On Sat, May 19, 2018 at 7:58 AM, Rui Barradas <ruipbarra...@sapo.pt> wrote:

> Hello,
>
> Maybe something like the following.
>
> splitDF <- function(data, col, s){
> n <- nrow(data)
> inx <- which(data[[col]] %in% s)
> lapply(seq_along(inx), function(i){
> k <- if(inx[i] < n) (inx[i] + 1):(inx[i + 1])
> data[k, ]
> })
> }
>
> splitDF(DF, "name", split_str)
>
>
> Hope this helps,
>
> Rui Barradas
>
>
> On 5/19/2018 12:07 PM, Christofer Bogaso wrote:
>
>> Hi,
>>
>> I am struggling to split a data.frame as will below scheme :
>>
>> DF = data.frame(name = c('a', 'v', 'c'), val = 0); DF
>>
>> split_str = c('a', 'c')
>>
>> Now, for each element in split_str, R should find which row of DF contains
>> that element, and return DF with all rows starting from next row of the
>> corresponding element and ending with the preceding value of the next
>> element.
>>
>> So in my case, I should see 2 data.frames
>>
>> 1st data-frame with name = 'v' (i.e. 2nd row of DF)
>>
>> 2nd data.frame with number_of_rows as 0 (as there is no row left after
>> 'c')
>>
>> Similarly if split_str = c('v'') then, my 2 data.frames will be
>>
>> 1st data.frame with name = 'a'
>> 2nd data.frame with name = 'c'
>>
>> Any idea how to efficiently implement above scheme would be highly
>> appreciated. I tried with split() function, however, it is not giving the
>> right answer.
>>
>> Thanks,
>>
>> __
>> R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
>> https://stat.ethz.ch/mailman/listinfo/r-help
>> PLEASE do read the posting guide http://www.R-project.org/posti
>> ng-guide.html
>> and provide commented, minimal, self-contained, reproducible code.
>>
>>
> __
> R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posti
> ng-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] how to write a loop to repetitive jobs

2018-04-18 Thread jim holtman
Try this:


result <- lapply(71:75, function(x){
# use 'paste0' to add the number to the file name
input <-
read.csv(paste0("C:/Awork/geneAssociation/removed8samples/neuhausen",
x,
"/seg.pr3.csv")
, head=TRUE
)
input$id <- paste0("sn", x)
input  # return the input
})

result <- do.call(rbind, result)  # combine dataframes together
​


Jim Holtman
Data Munger Guru

What is the problem that you are trying to solve?
Tell me what you want to do, not how you want to do it.

On Mon, Apr 16, 2018 at 1:54 PM, Ding, Yuan Chun <ycd...@coh.org> wrote:

> Hi All..,
>
> I need to do the following repetitive jobs:
>
> seg71 <- 
> read.csv("C:/Awork/geneAssociation/removed8samples/neuhausen71/seg.pr3.csv",
> head=T)
> seg71$id <-"sn71"
>
> seg72 <- 
> read.csv("C:/Awork/geneAssociation/removed8samples/neuhausen72/seg.pr3.csv",
> head=T)
> seg72$id <-"sn72"
>
> seg73 <- 
> read.csv("C:/Awork/geneAssociation/removed8samples/neuhausen73/seg.pr3.csv",
> head=T)
> seg73$id <-"sn73"
>
> seg74 <- 
> read.csv("C:/Awork/geneAssociation/removed8samples/neuhausen74/seg.pr3.csv",
> head=T)
> seg74$id <-"sn74"
>
> seg75 <- 
> read.csv("C:/Awork/geneAssociation/removed8samples/neuhausen75/seg.pr3.csv",
> head=T)
> seg75$id <-"sn75"
>
> seg <- rbind (seg71, seg72, seg73, seg74, seg75)
>
> I want to write a loop to do it;
>
> For ( d in 71:75) {
>   Dir<-paste("C:/Awork/geneAssociation/removed8samples/neuhausen", i,
> sep="")
>   setwd(Dir)
> ..
> then I do not know how to create objects seg71 to seg75;  in SAS, it would
> be  seg;
>
> I like R, but not good at R.
>
> Can you help me?
>
> Thank you,
>
> Ding
>
>
> -
> -SECURITY/CONFIDENTIALITY WARNING-
> This message (and any attachments) are intended solely...{{dropped:13}}

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] request for code

2018-01-18 Thread jim holtman
a simple Google search turns up several possible choices.  There is a
package 'matconv' that might serve your purposes.


Jim Holtman
Data Munger Guru

What is the problem that you are trying to solve?
Tell me what you want to do, not how you want to do it.

On Thu, Jan 18, 2018 at 7:49 AM, Anjali Karol Nair <anjali...@gmail.com>
wrote:

> Hi,
>
> I want to convert my MATLAB programs to R studio programs.
> Kindly guide on the same.
>
> [[alternative HTML version deleted]]
>
> __
> R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/
> posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] dplyr - add/expand rows

2017-11-26 Thread jim holtman
try this:

##

library(dplyr)

input <- tribble(
  ~station, ~from, ~to, ~record,
 "07EA001" ,1960  ,  1960  , "QMS",
 "07EA001"  ,   1961 ,   1970  , "QMC",
 "07EA001" ,1971  ,  1971  , "QMM",
 "07EA001" ,1972  ,  1976  , "QMC",
 "07EA001" ,1977  ,  1983  , "QRC"
)

result <- input %>%
  rowwise() %>%
  do(tibble(station = .$station,
year = seq(.$from, .$to),
record = .$record)
  )

###



Jim Holtman
Data Munger Guru

What is the problem that you are trying to solve?
Tell me what you want to do, not how you want to do it.

On Sun, Nov 26, 2017 at 2:10 PM, Bert Gunter <bgunter.4...@gmail.com> wrote:

> To David W.'s point about lack of a suitable reprex ("reproducible
> example"), Bill's solution seems to be for only one station.
>
> Here is a reprex and modification that I think does what was requested for
> multiple stations, again using base R and data frames, not dplyr and
> tibbles.
>
> First the reprex with **two** stations:
>
> > d <- data.frame( station = rep(c("one","two"),c(5,4)),
>from = c(60,61,71,72,76,60,65,82,83),
> to = c(60,70,71,76,83,64, 81, 82,83),
> record = c("A","B","C","B","D","B","B","D","E"))
>
> > d
>   station from to record
> 1 one   60 60  A
> 2 one   61 70  B
> 3 one   71 71  C
> 4 one   72 76  B
> 5 one   76 83  D
> 6 two   60 64  B
> 7 two   65 81  B
> 8 two   82 82  D
> 9 two   83 83  E
>
> ## Now the conversion code using base R, especially by():
>
> > out <- by(d, d$station, function(x) with(x, {
> +i <- to - from +1
> +data.frame(YEAR =sequence(i) -1 +rep(from,i), RECORD =rep(record,i))
> + }))
>
>
> > out <- data.frame(station =
> rep(names(out),sapply(out,nrow)),do.call(rbind,out), row.names = NULL)
>
>
> > out
>station YEAR RECORD
> 1  one   60  A
> 2  one   61  B
> 3  one   62  B
> 4  one   63  B
> 5  one   64  B
> 6  one   65  B
> 7  one   66  B
> 8  one   67  B
> 9  one   68  B
> 10 one   69  B
> 11 one   70  B
> 12 one   71  C
> 13 one   72  B
> 14 one   73  B
> 15 one   74  B
> 16 one   75  B
> 17 one   76  B
> 18 one   76  D
> 19 one   77  D
> 20 one   78  D
> 21 one   79  D
> 22 one   80  D
> 23 one   81  D
> 24 one   82  D
> 25 one   83  D
> 26 two   60  B
> 27 two   61  B
> 28 two   62  B
> 29 two   63  B
> 30 two   64  B
> 31 two   65  B
> 32 two   66  B
> 33 two   67  B
> 34 two   68  B
> 35 two   69  B
> 36 two   70  B
> 37 two   71  B
> 38 two   72  B
> 39 two   73  B
> 40 two   74  B
> 41 two   75  B
> 42 two   76  B
> 43 two   77  B
> 44 two   78  B
> 45 two   79  B
> 46 two   80  B
> 47 two   81  B
> 48 two   82  D
> 49 two   83  E
>
> Cheers,
> Bert
>
>
>
>
> Bert Gunter
>
> "The trouble with having an open mind is that people keep coming along and
> sticking things into it."
> -- Opus (aka Berkeley Breathed in his "Bloom County" comic strip )
>
> On Sat, Nov 25, 2017 at 4:49 PM, William Dunlap via R-help <
> r-help@r-project.org> wrote:
>
> > dplyr may have something for this, but in base R I think the following
> does
> > what you want.  I've shortened the name of your data set to 'd'.
> >
> > i <- rep(seq_len(nrow(d)), d$YEAR_TO-d$YEAR_FROM+1)
> > j <- sequence(d$YEAR_TO-d$YEAR_FROM+1)
> > transform(d[i,], YEAR=YEAR_FROM+j-1, YEAR_FROM=NULL, YEAR_TO=NULL)
> >
> >
> > Bill Dunlap
> > TIBCO Software
> > wdunlap tibco.com
> >
> > On Sat, Nov 25, 2017 at 11:18 AM, Hutchinson, David (EC) <
> > david.hutchin...@canada.ca> wrote:
> >
> > > I have a returned tibble of station operational record similar to the
> > > following:
> > >
> > > > data.collection
> > > # A tibble: 5 x 4
> > >   STATION_NUMBER YEAR_FROM YEAR_TO RECORD
> > >  
> > > 1 

Re: [R] function pointers?

2017-11-23 Thread jim holtman
I am replying to the first part of the question about the size of the
object.  It is probably best to use the "object_size" function in the
"pryr" package:

 ‘object_size’ works similarly to ‘object.size’, but counts more
 accurately and includes the size of environments. ‘compare_size’
 makes it easy to compare the output of ‘object_size’ and
 ‘object.size’.

Here is what you get from the same code:

> N <- 1
> closureList <- vector("list", N)
> nsize = sample(x = 1:100, size = N, replace = TRUE)
> for (i in seq_along(nsize)){
+ closureList[[i]] <- list(func = rnorm, n = nsize[i])
+ }
> format(object.size(closureList), units = "Mb")
[1] "22.4 Mb"
> pryr::compare_size(closureList)
base pryr
23520040  2241776

You will notice that you get back a size that is 10X smaller because it is
accounting for the shared space.


Jim Holtman
Data Munger Guru

What is the problem that you are trying to solve?
Tell me what you want to do, not how you want to do it.

On Wed, Nov 22, 2017 at 11:29 AM, Paul Johnson <pauljoh...@gmail.com> wrote:

> We have a project that calls for the creation of a list of many
> distribution objects.  Distributions can be of various types, with
> various parameters, but we ran into some problems. I started testing
> on a simple list of rnorm-based objects.
>
> I was a little surprised at the RAM storage requirements, here's an
> example:
>
> N <- 1
> closureList <- vector("list", N)
> nsize = sample(x = 1:100, size = N, replace = TRUE)
> for (i in seq_along(nsize)){
> closureList[[i]] <- list(func = rnorm, n = nsize[i])
> }
> format(object.size(closureList), units = "Mb")
>
> Output says
> 22.4 MB
>
> I noticed that if I do not name the objects in the list, then the
> storage drops to 19.9 MB.
>
> That seemed like a lot of storage for a function's name. Why so much?
> My colleagues think the RAM use is high because this is a closure
> (hence closureList).  I can't even convince myself it actually is a
> closure. The R source has
>
> rnorm <- function(n, mean=0, sd=1) .Call(C_rnorm, n, mean, sd)
>
> The storage holding 1 copies of rnorm, but we really only need 1,
> which we can use in the objects.
>
> Thinking of this like C,  I am looking to pass in a pointer to the
> function.  I found my way to the idea of putting a function in an
> environment in order to pass it by reference:
>
> rnormPointer <- function(inputValue1, inputValue2){
> object <- new.env(parent=globalenv())
> object$distr <- inputValue1
> object$n <- inputValue2
> class(object) <- 'pointer'
> object
> }
>
> ## Experiment with that
> gg <- rnormPointer(rnorm, 33)
> gg$distr(gg$n)
>
> ptrList <- vector("list", N)
> for(i in seq_along(nsize)) {
> ptrList[[i]] <- rnormPointer(rnorm, nsize[i])
> }
> format(object.size(ptrList), units = "Mb")
>
> The required storage is reduced to 2.6 Mb. Thats 1/10 of the RAM
> required for closureList.  This thing works in the way I expect
>
> ## can pass in the unnamed arguments for n, mean and sd here
> ptrList[[1]]$distr(33, 100, 10)
> ## Or the named arguments
> ptrList[[1]]$distr(1, sd = 100)
>
> This environment trick mostly works, so far as I can see, but I have
> these questions.
>
> 1. Is the object.size() return accurate for ptrList?  Do I really
> reduce storage to that amount, or is the required storage someplace
> else (in the new environment) that is not included in object.size()?
>
> 2. Am I running with scissors here? Unexpected bad things await?
>
> 3. Why is the storage for closureList so great? It looks to me like
> rnorm is just this little thing:
>
> function (n, mean = 0, sd = 1)
> .Call(C_rnorm, n, mean, sd)
> 
>
> 4. Could I learn (you show me?) to store the bytecode address as a
> thing and use it in the objects?  I'd guess that is the fastest
> possible way. In an Objective-C problem in the olden days, we found
> the method-lookup was a major slowdown and one of the programmers
> showed us how to save the lookup and use it over and over.
>
> pj
>
>
>
> --
> Paul E. Johnson   http://pj.freefaculty.org
> Director, Center for Research Methods and Data Analysis
> http://crmda.ku.edu
>
> To write to me directly, please address me at pauljohn at ku.edu.
>
> __
> R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/
> posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] dealing with a messy dataset

2017-10-05 Thread jim holtman
You should be able to use that header information to create the
correct parameters to the read_fwf function to read in the data.

Jim Holtman
Data Munger Guru

What is the problem that you are trying to solve?
Tell me what you want to do, not how you want to do it.


On Thu, Oct 5, 2017 at 11:02 AM, jean-philippe
<jeanphilippe.fonta...@gssi.infn.it> wrote:
> dear Jim,
>
> Thanks for your reply and your proposition.
>
> I forgot to provide the header of the dataframe, here it is:
> 
> Byte-by-byte Description of file: lvg_table2.dat
> 
>Bytes Format Units   Label   Explanations
> 
>1- 18 A18--- NameGalaxy name in well-known catalogs
>   20- 21 I2 h   RAh Hour of Right Ascension (J2000)
>   22- 23 I2 min RAm Minute of Right Ascension (J2000)
>   24- 27 F4.1   s   RAs Second of Right Ascension (J2000)
>   28 A1 --- DE- Sign of the Declination (J2000)
>   29- 30 I2 deg DEd Degree of Declination (J2000)
>   31- 32 I2 arcmin  DEm Arcminute of Declination (J2000)
>   33- 34 I2 arcsec  DEs Arcsecond of Declination (J2000)
>   36- 40 F5.2   kpc a26 ? Major linear diameter (1)
>   42- 43 I2 deg inc ? Inclination
>   45- 47 I3 km/sVm  ? Amplitude of rotational velocity (2)
>   49- 52 F4.2   mag AB  ? Internal B band extinction (3)
>   54- 58 F5.1   mag BMag? Absolute B band magnitude (4)
>   60- 63 F4.1   mag/arcsec2 SBB ? Average B band surface brightness (5)
>   65- 69 F5.2   [solLum]logKLum ? Log K_S_ band luminosity (6)
>   71- 75 F5.2   [solMass]   logM26  ? Log mass within Holmberg radius (7)
>   77 A1 ---   l_logMHI  Limit flag on logMHI
>   78- 82 F5.2   [solMass]   logMHI  ? Log hydrogen mass (8)
>   84- 87 I4 km/sVLG ? Radial velocity (9)
>   89- 92 F4.1   --- Theta1  ? Tidal index (10)
>   94-116 A23--- MD  Main disturber name (11)
>  118-121 F4.1   --- Theta5  ? Another tidal index (12)
>  123-127 F5.2   [-] Thetaj  ? Log K band luminosity density (13)
> 
>
> The idea for me is to select only the galaxy name and the logMHI values for
> these galaxies, so quite a simple job when the dataset is tidy enough. I was
> thinking as usual to use select from dplyr.
> That is why I was just asking how to read this kind of files which, for me
> so far, are uncommon.
>
> Doing what you propose, it formats most of the columns correctly except few
> ones, I will see how I can change some width to get it correctly:
>
>   X1  X2X3X4X5X6X7X8 X9X10
> X11   X12   X13   X14  X15   X16 X17
>(chr)   (chr) (dbl) (int) (dbl) (dbl) (chr) (dbl) (chr)
> (chr) (int) (chr) (chr) (chr)(chr) (dbl)   (chr)
> 1   UGC12894 22.5+392944  2.783321 0 -13.3  25.2 7.5 8  8.1
> 7   7.9 2  61 9 -1.3 NGC7640-1 0  0.12
> 2WLM 000158.1-152740  3.259022 0 -14.1 24.8 7.7 0 8.2
> 7   7.8 4  -1 6  0. 0 MESSIER031 0 2  1.75
> 3  And XVIII 000214.5+450520  0.6917 9 0  -8.7  26.8 6.4 4  6.7
> 8 < 6.6 5  -4 4  0. 5 MESSIER031 0 6  1.54
> 4  PAndAS-03 000356.4+405319  0.1017NA 0  -3.6  27.8 4.3  8
> NANANA2. 8 MESSIER031 2 8  1.75
> 5  PAndAS-04 000442.9+472142  0.0522NA 0  -6.6  23.1 5.5  9
> NANA   -10 8  2. 5 MESSIER031 2 5  1.75
> 6  PAndAS-05 000524.1+435535  0.0631NA 0  -4.5  25.6 4.7  5
> NANA10 3  2. 8 MESSIER031 2 8  1.75
> 7 ESO409-015 000531.8-280553  3.007823 0 -14.6  24.1 8.1 0  8.2
> 5   8.1 0  76 9 -2.0 NGC0024-1 5 -2.05
> 8  AGC748778 000634.4+153039 0.61 70 3 0 -10.4  24.9 6.3 9  5.7
> 0   6.6 4  48 6 -1.9 NGC0253-1 5 -2.72
> 9 And XX 000730.7+350756  0.2033 5 0  -5.8  27.1 5.2 6  5.7
> 0NA   -18 2  2. 4 MESSIER031 2 4  1.75
>
>
> Cheers, thanks again
>
>
> Jean-Philippe
> On 05/10/2017 16:49, jim holtman wrote:
>>
>> start <- c(1, 20, 35, 41, 44, 48, 53, 59, 64, 69, 75, 77, 82, 87,
>>  +92, 114, 121, 127)
>>  > read_fwf(input, fwf_widths(diff(start)))
>
>
> --
> Jean-Philippe Fontaine
> PhD Student in Astroparticle Physics,
> Gran Sasso Science Institute (GSSI),
> Viale F

Re: [R] dealing with a messy dataset

2017-10-05 Thread jim holtman
It looks like fixed width.  I just used the last position of each
field to get the size and used the 'readr' package;

> input <- "And XVIII  000214.5+450520  0.69 17   9 0.00
-8.7 26.8 6.44  6.78 < 6.65  -44  0.5 MESSIER031   0.6
1.54
+ PAndAS-03  000356.4+405319  0.10 17 0.00  -3.6 27.8
4.382.8 MESSIER031   2.8  1.75
+ PAndAS-04  000442.9+472142  0.05 22 0.00  -6.6 23.1
5.59  -108  2.5 MESSIER031   2.5  1.75
+ PAndAS-05  000524.1+435535  0.06 31 0.00  -4.5 25.6
4.75   103  2.8 MESSIER031   2.8  1.75
+ ESO409-015 000531.8-280553  3.00 78  23 0.00 -14.6 24.1
8.10  8.25   8.10  769 -2.0 NGC0024 -1.5 -2.05
+ AGC748778  000634.4+153039  0.61 70   3 0.00 -10.4 24.9
6.39  5.70   6.64  486 -1.9 NGC0253 -1.5 -2.72
+ And XX 000730.7+350756  0.20 33   5 0.00  -5.8 27.1
5.26  5.70-182  2.4 MESSIER031   2.4  1.75"
>
> start <- c(1, 20, 35, 41, 44, 48, 53, 59, 64, 69, 75, 77, 82, 87,
+92, 114, 121, 127)
> read_fwf(input, fwf_widths(diff(start)))
# A tibble: 7 x 17
  X1  X2X3X4X5X6X7X8
 X9   X10   X11   X12   X13   X14

 
1  And XVIII 000214.5+450520  0.6917 9 0  -8.7  26.8
6.44  6.78 <  6.65   -44   0.5
2  PAndAS-03 000356.4+405319  0.1017NA 0  -3.6  27.8
4.38NA  NANA   2.8
3  PAndAS-04 000442.9+472142  0.0522NA 0  -6.6  23.1
5.59NA  NA  -108   2.5
4  PAndAS-05 000524.1+435535  0.0631NA 0  -4.5  25.6
4.75NA  NA   103   2.8
5 ESO409-015 000531.8-280553  3.007823 0 -14.6  24.1
8.10  8.258.10   769  -2.0
6  AGC748778 000634.4+153039  0.6170 3 0 -10.4  24.9
6.39  5.706.64   486  -1.9
7 And XX 000730.7+350756  0.2033 5 0  -5.8  27.1
5.26  5.70  NA  -182   2.4
# ... with 3 more variables: X15 , X16 , X17 
>


Jim Holtman
Data Munger Guru

What is the problem that you are trying to solve?
Tell me what you want to do, not how you want to do it.


On Thu, Oct 5, 2017 at 10:12 AM, jean-philippe
<jeanphilippe.fonta...@gssi.infn.it> wrote:
> dear R-users,
>
>
> I am facing a quite regular and basic problem when it comes to dealing with
> datasets, but I cannot find any satisfying answer so far.
> I have a messy dataset of galaxies like that :
>
> And XVIII  000214.5+450520  0.69 17   9 0.00  -8.7 26.8 6.44  6.78 <
> 6.65  -44  0.5 MESSIER031   0.6  1.54
> PAndAS-03  000356.4+405319  0.10 17 0.00  -3.6 27.8 4.38
> 2.8 MESSIER031   2.8  1.75
> PAndAS-04  000442.9+472142  0.05 22 0.00  -6.6 23.1 5.59
> -108  2.5 MESSIER031   2.5  1.75
> PAndAS-05  000524.1+435535  0.06 31 0.00  -4.5 25.6 4.75
> 103  2.8 MESSIER031   2.8  1.75
> ESO409-015 000531.8-280553  3.00 78  23 0.00 -14.6 24.1 8.10  8.25
> 8.10  769 -2.0 NGC0024 -1.5 -2.05
> AGC748778  000634.4+153039  0.61 70   3 0.00 -10.4 24.9 6.39  5.70
> 6.64  486 -1.9 NGC0253 -1.5 -2.72
> And XX 000730.7+350756  0.20 33   5 0.00  -5.8 27.1 5.26  5.70
> -182  2.4 MESSIER031   2.4  1.75
>
> What I would like to do is to read this dataset, but I would like that the
> space between And and XVIII is not interpreted as 2 different columns but as
> the name of the galaxy in one column.
> How is it possible to do so?
>
> For instance I did this data1<-read.table("lvg_table2.txt",skip=70,fill=T)
> where I used fill=T because the rows don't have the same number of features
> since R splits the name of the galaxies into 2 columns because of the space.
>
>
> Best Regards, thanks in advance
>
>
> Jean-Philippe Fontaine
>
> --
> Jean-Philippe Fontaine
> PhD Student in Astroparticle Physics,
> Gran Sasso Science Institute (GSSI),
> Viale Francesco Crispi 7,
> 67100 L'Aquila, Italy
> Mobile: +393487128593, +33615653774
>
> __
> R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Calculating Weeks Since Last Event

2017-09-15 Thread jim holtman
Try this:


# supplied data
library(zoo)  # need the 'na.locf' function

x <- structure(list(ScanDate = structure(c(16433, 16440, 16447, 16454,
   16461, 16468, 16475, 16482,
16489, 16496, 16503, 16510, 16517,
   16524, 16531, 16538, 16545,
16552, 16559, 16566, 16573, 16580,
   16587, 16594, 16601, 16608,
16615, 16622), class = "Date"), OnPromotion =
  c(0,
0, 0, 0, 0, 0, 0, 0, 0, 1, 1, 1, 1, 1, 1, 1,
0, 1, 1, 1, 1, 0,
0, 0, 1, 1, 1, 1)), .Names = c("ScanDate",
"OnPromotion"), sorted =
 "ScanDate", class = c("data.table",
   "data.frame"), row.names = c(NA, -28L))


# find where the promotions start and then create a flag that indicates when
# the previous promotion started
indx <- which(x$OnPromotion == 1)[1]  # get initial promotion
if (length(indx) == 0) stop('no promtions')  # make sure there is one
in the data

# add a column with the running total of promotions
x$count <- c(rep(0, indx - 1), seq(0, length = nrow(x) - indx + 1))
x$flag <- x$count  # save a copy

# now replace no promotions with NAs so we can use 'na.locf'
indx <- (x$OnPromotion == 0) & (x$count != 0)
x$flag[indx] <- NA
x$flag <- zoo::na.locf(x$flag)

# determine weeks since
x$weeks_since <- ifelse(x$count != 0,
x$count - x$flag + 1,
        0
)

x  # print out the result


##


Jim Holtman
Data Munger Guru

What is the problem that you are trying to solve?
Tell me what you want to do, not how you want to do it.


On Fri, Sep 15, 2017 at 5:02 AM, Abhinaba Roy <abhinabaro...@gmail.com> wrote:
> Hi,
>
> I have an input data
>
>> dput (input)
>
> structure(list(ScanDate = structure(c(16433, 16440, 16447, 16454,
> 16461, 16468, 16475, 16482, 16489, 16496, 16503, 16510, 16517,
> 16524, 16531, 16538, 16545, 16552, 16559, 16566, 16573, 16580,
> 16587, 16594, 16601, 16608, 16615, 16622), class = "Date"), OnPromotion =
> c(0,
> 0, 0, 0, 0, 0, 0, 0, 0, 1, 1, 1, 1, 1, 1, 1, 0, 1, 1, 1, 1, 0,
> 0, 0, 1, 1, 1, 1)), .Names = c("ScanDate", "OnPromotion"), sorted =
> "ScanDate", class = c("data.table",
> "data.frame"), row.names = c(NA, -28L))
>
> I am looking for an output
>
>> dput(output)
>
> structure(list(ScanDate = structure(c(16433, 16440, 16447, 16454,
> 16461, 16468, 16475, 16482, 16489, 16496, 16503, 16510, 16517,
> 16524, 16531, 16538, 16545, 16552, 16559, 16566, 16573, 16580,
> 16587, 16594, 16601, 16608, 16615, 16622), class = "Date"), OnPromotion =
> c(0,
> 0, 0, 0, 0, 0, 0, 0, 0, 1, 1, 1, 1, 1, 1, 1, 0, 1, 1, 1, 1, 0,
> 0, 0, 1, 1, 1, 1), Weeks_Since_Last_Promo = c(0, 0, 0, 0, 0,
> 0, 0, 0, 0, 0, 1, 1, 1, 1, 1, 1, 1, 2, 1, 1, 1, 1, 2, 3, 4, 1,
> 1, 1)), .Names = c("ScanDate", "OnPromotion", "Weeks_Since_Last_Promo"
> ), sorted = "ScanDate", class = c("data.table", "data.frame"), row.names =
> c(NA,
> -28L))
>
> The logic :
>
> The data is weekly.
>
> I want to calculate the number of weeks elapsed since the last promotion
> (OnPromotion : 1 indicates promotion for that week and 0 indicates no
> promotion).
>
> As, there are no promotion initially we set the value for
> 'Weeks_Since_Last_Promo' to 0 (zero). The first promo occurs on
> '2015-03-02' and 'Weeks_Since_Last_Promo' is still 0. Moving to
> '2015-03-09' there was a promotion the week before and so 1 week elapsed
> after the last promo.
>
> If we look at '2015-06-15' then there was a promo 4 weeks back in the week
> of '2015-05-18' and so 'Weeks_Since_Last_Promo' = 4.
>
> How can we do it in R?
>
> Thanks,
> Abhinaba
>
> [[alternative HTML version deleted]]
>
> __
> R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Merge by Range in R

2017-09-04 Thread jim holtman
Have you tried 'foverlaps' in the data.table package?


Jim Holtman
Data Munger Guru

What is the problem that you are trying to solve?
Tell me what you want to do, not how you want to do it.

On Mon, Sep 4, 2017 at 8:31 AM, Mohammad Tanvir Ahamed via R-help <
r-help@r-project.org> wrote:

> Hi,
> I have two big data set.
>
> data _1 :
> > dim(data_1)
> [1] 15820 5
>
> > head(data_1)
>Chromosome  StartEndFeature GroupA_3
> 1:   chr1 521369  75 chr1-00010.170
> 2:   chr1 750001  80 chr1-0002   -0.086
> 3:   chr1 81  85 chr1-00030.006
> 4:   chr1 850001  90 chr1-00040.050
> 5:   chr1 91  95 chr1-00050.062
> 6:   chr1 950001 100chr1-0006   -0.016
>
> data_2:
> > dim(data_2)
> [1] 470870 5
>
> > head(data_2)
>Chromosome Start   EndFeature GroupA_3
> 1:   chr1 15864 15865 cg138693410.207
> 2:   chr1 18826 18827 cg14008030   -0.288
> 3:   chr1 29406 29407 cg12045430   -0.331
> 4:   chr1 29424 29425 cg20826792   -0.074
> 5:   chr1 29434 29435 cg003816040.141
> 6:   chr1 68848 68849 cg20253340   -0.458
>
>
> What I want to do :
> Based on column name "Chromosome", "Start" and "End" of two data set ,   I
> want to find which row (preciously "Feature") of data_2 is in every range (
> between "Start" and "End") of data_1 ? Also "Chromosome" column element
> should be match between two data set.
>
> I have tried "GenomicRanges" packages describe in the post
> https://stackoverflow.com/questions/11892241/merge-by-
> range-in-r-applying-loops
> But i was not successful. Can any one please help me to do this fast, as
> the data is very big ?
> Thanks in advance.
>
>
> Regards.
> Tanvir Ahamed Stockholm, Sweden |  mashra...@yahoo.com
>
> [[alternative HTML version deleted]]
>
> __
> R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/
> posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Has For bucle be impooved in R

2017-08-07 Thread jim holtman
If you run it under the profiler in RStudio, you will see that the 'lm'
call is taking about 2 seconds longer in the function which might have to
do with resolving the reference.  So it is probably the function call in
'lapply' vs. the in-line statement in the 'for' loop that account for the
differences.  I have attached the output of the profiler.


Jim Holtman
Data Munger Guru

What is the problem that you are trying to solve?
Tell me what you want to do, not how you want to do it.

On Mon, Aug 7, 2017 at 10:57 AM, Thierry Onkelinx <thierry.onkel...@inbo.be>
wrote:

> Dear Jesus,
>
> The difference is marginal when each code chunk does the same things. Your
> for loop does not yields the same output as the lapply. Here is the cleaned
> version of your code.
>
> n<-1
> set.seed(123)
> x<-rnorm(n)
> y<-x+rnorm(n)
> rand.data<-data.frame(x,y)
> k<-100
> samples <- split(sample(n), rep(seq_len(k),length=n))
>
> library(microbenchmark)
> microbenchmark(
>   "for" = {
> res <- vector("list", length(samples))
> for(index in seq_along(samples)) {
>   fit <- lm(y~x, data = rand.data[-samples[[index]],])
>   pred <- predict(fit, newdata = rand.data[samples[[index]],])
>   res[[i]] <- ((pred - rand.data$y[samples[[index]]])^2)
> }
>   },
>   lapply = {
> cv.fold.fun <- function(index){
>   fit <- lm(y~x, data = rand.data[-samples[[index]],])
>   pred <- predict(fit, newdata = rand.data[samples[[index]],])
>   return((pred - rand.data$y[samples[[index]]])^2)
> }
> lapply(seq_along(samples), cv.fold.fun)
>   }
> )
>
> Unit: milliseconds
>expr  min   lq mean   median   uq  max neval cld
> for 866.4196 897.3137 949.8155 926.1918 946.8390 1767.463   100   a
>  lapply 837.7804 889.6620 947.2401 909.9946 939.6379 2476.415   100   a
>
> Best regards,
>
>
> ir. Thierry Onkelinx
> Instituut voor natuur- en bosonderzoek / Research Institute for Nature and
> Forest
> team Biometrie & Kwaliteitszorg / team Biometrics & Quality Assurance
> Kliniekstraat 25
> 1070 Anderlecht
> Belgium
>
> To call in the statistician after the experiment is done may be no more
> than asking him to perform a post-mortem examination: he may be able to say
> what the experiment died of. ~ Sir Ronald Aylmer Fisher
> The plural of anecdote is not data. ~ Roger Brinner
> The combination of some data and an aching desire for an answer does not
> ensure that a reasonable answer can be extracted from a given body of data.
> ~ John Tukey
>
> 2017-08-07 16:48 GMT+02:00 Jeff Newmiller <jdnew...@dcn.davis.ca.us>:
>
> > The lapply loop and the for loop have very similar speed characteristics.
> > Differences seen are almost always due to how you use memory in the body
> of
> > the loop. This fact is not new. You may be under the incorrect assumption
> > that using lapply is somehow equivalent to "vectorization", which it is
> not.
> > --
> > Sent from my phone. Please excuse my brevity.
> >
> > On August 7, 2017 7:29:58 AM PDT, "Jesús Para Fernández" <
> > j.para.fernan...@hotmail.com> wrote:
> > >Hi!
> > >
> > >I am doing a lapply and for comparaison and I get that for is faster
> > >than lapply.
> > >
> > >
> > >What I have done:
> > >
> > >
> > >
> > >n<-10
> > >set.seed(123)
> > >x<-rnorm(n)
> > >y<-x+rnorm(n)
> > >rand.data<-data.frame(x,y)
> > >k<-100
> > >samples<-split(sample(1:n),rep(1:k,length=n))
> > >
> > >res<-list()
> > >t<-Sys.time()
> > >for(i in 1:100){
> > >  modelo<-lm(y~x,rand.data[-samples[[i]]])
> > >  prediccion<-predict(modelo,rand.data[samples[[i]],])
> > >  res[[i]] <- (prediccion - rand.data$y[samples[[i]]])
> > >
> > >}
> > >print(Sys.time()-t)
> > >
> > >Which takes 8.042 seconds
> > >
> > >and using Lapply
> > >
> > >cv.fold.fun <- function(index){
> > >   fit <- lm(y~x, data = rand.data[-samples[[index]],])
> > >   pred <- predict(fit, newdata = rand.data[samples[[index]],])
> > >   return((pred - rand.data$y[samples[[index]]])^2)
> > >  }
> > >
> > >
> > >t<-Sys.time()
> > >
> > >nuevo<-lapply(seq(along = samples),cv.fold.fun)
> > >print(Sys.time()-t)
> > >
> > >
> > >Which takes 9.56 seconds.
> > >
> > >So... has been improved the F

Re: [R] Importing Big data to R

2017-07-12 Thread jim holtman
A little more information would be useful.  Why did it stop? Was there an
error message?  Can you show the commands/console log of what you did.
Provide information on how much memory your computer has on it.  When the
operation completed, how much memory was used.  An important aspect is how
many columns did the data have.  How big was the file on disk.  What other
objects were in memory at the same time.  The list can go on and on, so
more information would be useful to understand the problem.


Jim Holtman
Data Munger Guru

What is the problem that you are trying to solve?
Tell me what you want to do, not how you want to do it.

On Wed, Jul 12, 2017 at 2:44 AM, Mangalani Peter Makananisa <
pmakanan...@sars.gov.za> wrote:

> Dear R-Gurus,
>
> I am trying to read in data with 24,349,113 rows to R-3.3.3 (64 bit) and
> have used  the library   "data.table"  and It managed to read 23,347,070
> rows  and the remainder was 2,043 rows only.
>
> Could you please advise me as to which library/R-commands is suitable to
> read the full data in to R?
>
> Kind regards,
>
> Mangalani Peter Makananisa (5786)
> South African Revenue Service (SARS)
> +2782 456 4669 / +2712 422 7357
>
> Please Note: This email and its contents are subject to our email legal
> notice which can be viewed at http://www.sars.gov.za/Pages/
> Email-disclaimer.aspx
>
> [[alternative HTML version deleted]]
>
> __
> R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/
> posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Extraneous full stop in csv read

2017-06-28 Thread jim holtman
or use the 'check.names = FALSE':

> x <- read.csv(text = '"yr","mo","Data","in"
+ 1895,1,8243,8.243
+ 1895,2,2265,2.265
+ 1895,3,2340,2.34
+ 1895,4,1014,1.014
+ 1895,5,1281,1.281
+ 1895,6,58,0.058
+ 1895,7,156,0.156
+ 1895,8,140,0.14
+ 1895,9,1087,1.087
+ 1895,10,322,0.322
+ 1895,11,1331,1.331
+ 1895,12,2428,2.428
+ 1896,1,7156,7.156
+ 1896,2,712,0.712
+ 1896,3,2982,2.982
+ ', check.names = FALSE)
> str(x)
'data.frame': 15 obs. of  4 variables:
 $ yr  : int  1895 1895 1895 1895 1895 1895 1895 1895 1895 1895 ...
 $ mo  : int  1 2 3 4 5 6 7 8 9 10 ...
 $ Data: int  8243 2265 2340 1014 1281 58 156 140 1087 322 ...
 $ in  : num  8.24 2.27 2.34 1.01 1.28 ...


Jim Holtman
Data Munger Guru

What is the problem that you are trying to solve?
Tell me what you want to do, not how you want to do it.

On Wed, Jun 28, 2017 at 7:30 PM, John <j...@surewest.net> wrote:

> I ran into a puzzling minor behaviour I would like to understand.
> Reading in a csv file, I find an extraneous "." after a column header,
> "in" [short for "inches"] thus, "in.". Is this due to "in" being
> reserved?  I initially blamed this on RStudio or to processing the data
> through LibreCalc. However, the same result occurs in a console R
> session.  Sending the file to the console via less reveals no strange
> characters in the first line.  The data is California statewide
> rainfall which was screen captured from the Western Regional Climate
> Center web site.
>
> First 15 lines including header line:
>
> "yr","mo","Data","in"
> 1895,1,8243,8.243
> 1895,2,2265,2.265
> 1895,3,2340,2.34
> 1895,4,1014,1.014
> 1895,5,1281,1.281
> 1895,6,58,0.058
> 1895,7,156,0.156
> 1895,8,140,0.14
> 1895,9,1087,1.087
> 1895,10,322,0.322
> 1895,11,1331,1.331
> 1895,12,2428,2.428
> 1896,1,7156,7.156
> 1896,2,712,0.712
> 1896,3,2982,2.982
>
> File read in as follows:
>
> x <- read.csv('DRI-mo-prp.csv', header = T)
>
> Structure:
>
>  str(x)
> 'data.frame':   1469 obs. of  4 variables:
>  $ yr  : int  1895 1895 1895 1895 1895 1895 1895 1895 1895 1895 ...
>  $ mo  : int  1 2 3 4 5 6 7 8 9 10 ...
>  $ Data: int  8243 2265 2340 1014 1281 58 156 140 1087 322 ...
>  $ in. : num  8.24 2.27 2.34 1.01 1.28 ...
> [note "in" is now "in."]
>
> __
> R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/
> posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Extraneous full stop in csv read

2017-06-28 Thread jim holtman
try the 'read_csv' function in the 'readr' package:

> x <- readr::read_csv('"yr","mo","Data","in"
+ 1895,1,8243,8.243
+ 1895,2,2265,2.265
+ 1895,3,2340,2.34
+ 1895,4,1014,1.014
+ 1895,5,1281,1.281
+ 1895,6,58,0.058
+ 1895,7,156,0.156
+ 1895,8,140,0.14
+ 1895,9,1087,1.087
+ 1895,10,322,0.322
+ 1895,11,1331,1.331
+ 1895,12,2428,2.428
+ 1896,1,7156,7.156
+ 1896,2,712,0.712
+ 1896,3,2982,2.982
+ ')
> str(x)
Classes ‘tbl_df’, ‘tbl’ and 'data.frame': 15 obs. of  4 variables:
 $ yr  : int  1895 1895 1895 1895 1895 1895 1895 1895 1895 1895 ...
 $ mo  : int  1 2 3 4 5 6 7 8 9 10 ...
 $ Data: int  8243 2265 2340 1014 1281 58 156 140 1087 322 ...
 $ in  : num  8.24 2.27 2.34 1.01 1.28 ...


Jim Holtman
Data Munger Guru

What is the problem that you are trying to solve?
Tell me what you want to do, not how you want to do it.

On Wed, Jun 28, 2017 at 7:30 PM, John <j...@surewest.net> wrote:

> I ran into a puzzling minor behaviour I would like to understand.
> Reading in a csv file, I find an extraneous "." after a column header,
> "in" [short for "inches"] thus, "in.". Is this due to "in" being
> reserved?  I initially blamed this on RStudio or to processing the data
> through LibreCalc. However, the same result occurs in a console R
> session.  Sending the file to the console via less reveals no strange
> characters in the first line.  The data is California statewide
> rainfall which was screen captured from the Western Regional Climate
> Center web site.
>
> First 15 lines including header line:
>
> "yr","mo","Data","in"
> 1895,1,8243,8.243
> 1895,2,2265,2.265
> 1895,3,2340,2.34
> 1895,4,1014,1.014
> 1895,5,1281,1.281
> 1895,6,58,0.058
> 1895,7,156,0.156
> 1895,8,140,0.14
> 1895,9,1087,1.087
> 1895,10,322,0.322
> 1895,11,1331,1.331
> 1895,12,2428,2.428
> 1896,1,7156,7.156
> 1896,2,712,0.712
> 1896,3,2982,2.982
>
> File read in as follows:
>
> x <- read.csv('DRI-mo-prp.csv', header = T)
>
> Structure:
>
>  str(x)
> 'data.frame':   1469 obs. of  4 variables:
>  $ yr  : int  1895 1895 1895 1895 1895 1895 1895 1895 1895 1895 ...
>  $ mo  : int  1 2 3 4 5 6 7 8 9 10 ...
>  $ Data: int  8243 2265 2340 1014 1281 58 156 140 1087 322 ...
>  $ in. : num  8.24 2.27 2.34 1.01 1.28 ...
> [note "in" is now "in."]
>
> __
> R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/
> posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] reading data

2017-06-13 Thread jim holtman
You need to provide reproducible data.  What does the file contain?  Why
are you using 'sep=' when reading fixed format.  You might be able to
attach the '.txt' to your email to help with the problem.  Also you did not
state what the differences that you are seeing.  So help us out here.


Jim Holtman
Data Munger Guru

What is the problem that you are trying to solve?
Tell me what you want to do, not how you want to do it.

On Tue, Jun 13, 2017 at 5:09 PM, Ashta <sewa...@gmail.com> wrote:

> Hi all,
>
> I am using R to extract  data on a regular basis.
> However, sometimes using the same script and the same data I am
> getting different observation.
> The library I am using and how I am reading  it is as follows.
>
> library(stringr)
> namelist <- file("Adress1.txt",encoding="ISO-8859-1")
> Name <- read.fwf(namelist,
> colClasses="character", skip=2,sep="\t",fill=T,
>   width =c(2,8,1,1,1,1,1,1,9,5)+1,col.names=ccol)
>
> Can some one suggest me how track the issue?
> Is it the library issue or Java issue?
> May I read as free format instead of fixed format?
>
> Thank you in advance
>
> __
> R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/
> posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Joining tables with different order and matched values

2017-05-14 Thread jim holtman
Here is a solution to the "shared values" question

> library(stringr)
> input <- read.table(text = "A B
+
+ 1,2,5   3,8,7
+
+ 2,4,6   7,6,3  ",
+ header = TRUE,
+ as.is = TRUE
+ )
>
> input$'shared values' <- apply(input, 1, function(x){
+ toString(intersect(str_extract_all(x[1], "[^,]")[[1]],
+   str_extract_all(x[2], "[^,]")[[1]]
+   ))
+ })
>
> input
  A     B shared values
1 1,2,5 3,8,7
2 2,4,6 7,6,3 6



Jim Holtman
Data Munger Guru

What is the problem that you are trying to solve?
Tell me what you want to do, not how you want to do it.

On Mon, May 8, 2017 at 10:56 AM, abo dalash <abo_d...@hotmail.com> wrote:

> Hi All ..,
>
>
> I have 2 tables and I'm trying to have some information from the 1st table
> to appear in the second table with different order.
>
>
> For Example, let's say this is my 1st table :-
>
>
>
> Drug name   indications
>
>  IbuprofenPain
>
>  Simvastatinhyperlipidemia
>
> losartan   hypertension
>
>
>
> my 2nd table is in different order for the 1st column :-
>
>
> Drug name   indications
>
>
> Simvastatin
>
> losartan
>
> Ibuprofen
>
> Metformin
>
>
> I wish to see the indication of each drug in my 2nd table subsisted from
> the information in my 1st table so the final table
>
> would be like this
>
>
> Drug name   indications
>
>
> Simvastatin hyperlipidemia
>
> losartan   hypertension
>
> Ibuprofen   pain
>
> MetforminN/A
>
>
> I have been trying to use Sqldf package and right join function but not
> able to formulate the correct syntax.
>
>
> I'm also trying to identify rows contain at least one shared value  in a
> dataset called 'Values":
>
>
> >Values
>
> A B
>
> 1,2,5   3,8,7
>
> 2,4,6   7,6,3
>
>
>
> Columns A & B in the first row do not share any value while in the 2nd row
> they have a single shared value which is 6.
>
> The result I wish to see :-
>
>
> A B shared values
>
> 1,2,5   3,8,7 N/A
>
> 2,4,6   7,6,3   6
>
>
> I tried this syntax : SharedValues <- Values$A == Values$B but this
> returns logical results and what I wish to have
>
> is a new data frame including the new vector "shared values" showing the
> information exactly as above.
>
>
>
>
> Kind Regards
>
>
>
>
>
>
> [[alternative HTML version deleted]]
>
> __
> R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/
> posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Return value from function with For loop

2017-04-16 Thread jim holtman
In the first case you have a "for" and it is the statement after the
'for' that is the return value and it is a NULL.  For example:

> print(for (i in 1:4) i+1)
NULL

In the second case, the last statement if the expression '(n+1)' which
give you the correct value:

> xx <- function(n) n+1
> print(xx(3))
[1] 4
>




Jim Holtman
Data Munger Guru

What is the problem that you are trying to solve?
Tell me what you want to do, not how you want to do it.


On Sun, Apr 16, 2017 at 10:26 PM, Ramnik Bansal <ramnik.ban...@gmail.com> wrote:
> In the code below
>
>
> *ff <- function(n){ for(i in 1:n) (i+1)}*
>
> *n<-3;ff(n)->op;print(op)*
>
> Why doesnt *print(op) * print 4 and instead prints NULL.
> Isnt the last line of code executed is *i+1 * and therefore that should be
> returned instead of NULL
>
> instead if I say
> *ff <- function(n){ (n+1) }*
>
> Then
> *n<-3;ff(n)->op;rm(n);print(op)*
> gives 4 as output.
>
> My question is *Which *is considered as the last line in a functoin for the
> purpsoe of default return ? And under what conditions ?
>
> -Thanks,
> Ramnik
>
> [[alternative HTML version deleted]]
>
> __
> R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] help on readBin in R

2017-04-16 Thread jim holtman
If the file is not too large, just change the extension to '.txt' and
attach it.  Also include the code that you are using to read it in and
a definition of the what the data is; e.g., first two byte are
temperature, next four bytes are a station ID, 

Here is an example of reading in a binary file and I know that the
'raw' output matches the bytes that are in the file:

> infile <- file("test.txt", 'rb')
>
> input <- readBin(infile, raw(), 100)
>
> input
  [1] 50 4b 03 04 14 00 04 00 08 00 47 95 90 4a 9f 00 7a a0 99 01 00
00 7a 08 00 00 13 00 75 00 5b 43 6f
 [34] 6e 74 65 6e 74 5f 54 79 70 65 73 5d 2e 78 6d 6c 53 44 60 00 a4
00 00 00 00 08 00 32 fa a9 3f 63 64
 [67] 60 69 11 61 60 60 30 00 62 10 f0 01 62 46 56 30 93 55 14 48 55
e8 cd 15 fe e5 cf a3 df 6c ab ed 66
[100] b1
>

here is the dump:

$ od -a -t x --endian=big test.txt
000   P   K etx eot dc4 nul eot nul  bs nul   G nak dle   J  us nul
   504b03041400040008004795904a9f00
020   z  sp  em soh nul nul   z  bs nul nul dc3 nul   u nul   [   C
   7aa099017a08130075005b43
040   o   n   t   e   n   t   _   T   y   p   e   s   ]   .   x   m
   6f6e74656e745f54    797065735d2e786d

Jim Holtman
Data Munger Guru

What is the problem that you are trying to solve?
Tell me what you want to do, not how you want to do it.


On Sun, Apr 16, 2017 at 9:21 PM, Jeff Newmiller
<jdnew...@dcn.davis.ca.us> wrote:
> The mailing list has tight restrictions on attachments, so your attachment 
> was not let through. Read the Posting Guide, and note that sometimes success 
> requires some extended understanding of how your mail software works, and we 
> probably don't know the details either. You might have success changing the 
> file extension or sending a link to the file on a file storage website like 
> Google Drive or Dropbox.
> --
> Sent from my phone. Please excuse my brevity.
>
> On April 16, 2017 3:49:07 PM PDT, "M.M saifuddin" <mmsaifuddi...@gmail.com> 
> wrote:
>>I need to view the attached  binary file. but can not read it, instead
>>am
>>getting very weird( i think garbage) numbers.
>>
>>The values are Temperature data so it should be somewhat in between 250
>>to
>>500.
>>
>>Can any altruist view it and give me the R code to view it.
>>
>>I am attaching the file. Please help me if you can.
>>
>>TIA
>>__
>>R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
>>https://stat.ethz.ch/mailman/listinfo/r-help
>>PLEASE do read the posting guide
>>http://www.R-project.org/posting-guide.html
>>and provide commented, minimal, self-contained, reproducible code.
>
> __
> R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] taking a small piece of large tiff

2017-04-05 Thread jim holtman
if you have 8GB of memory it should be easy to handle.


Jim Holtman
Data Munger Guru

What is the problem that you are trying to solve?
Tell me what you want to do, not how you want to do it.


On Wed, Apr 5, 2017 at 3:23 AM, Louisa Reynolds
<louisa_reyno...@yahoo.co.uk> wrote:
> Ok. I have a tiff of size over 2GB. It covers a sixth of the Earth's surface 
> and I'm trying to cut a UK piece out of it. The tiff I start with seems to be 
> too large for R to handle.
>
>
> Sent from my iPhone
>
>> On 4 Apr 2017, at 18:37, jim holtman <jholt...@gmail.com> wrote:
>>
>> How big is 'large'?
>>
>> Jim Holtman
>> Data Munger Guru
>>
>> What is the problem that you are trying to solve?
>> Tell me what you want to do, not how you want to do it.
>>
>>
>> On Tue, Apr 4, 2017 at 7:47 AM, Louisa Reynolds via R-help
>> <r-help@r-project.org> wrote:
>>> Dear Forum
>>> I am trying to cut out a small section of a very large 2-dimensional 
>>> grayscale image as a tiff in R, but it is having difficulty handling such 
>>> large files.  I have looked at bigmemory and ff packages but it is unclear 
>>> how I can use these packages with tiffs. Can anyone please suggest 
>>> something? I have tried tiff and rtiff libraries.
>>> Thanks in advance.
>>>[[alternative HTML version deleted]]
>>>
>>> __
>>> R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
>>> https://stat.ethz.ch/mailman/listinfo/r-help
>>> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
>>> and provide commented, minimal, self-contained, reproducible code.
>

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] taking a small piece of large tiff

2017-04-04 Thread jim holtman
How big is 'large'?

Jim Holtman
Data Munger Guru

What is the problem that you are trying to solve?
Tell me what you want to do, not how you want to do it.


On Tue, Apr 4, 2017 at 7:47 AM, Louisa Reynolds via R-help
<r-help@r-project.org> wrote:
> Dear Forum
> I am trying to cut out a small section of a very large 2-dimensional 
> grayscale image as a tiff in R, but it is having difficulty handling such 
> large files.  I have looked at bigmemory and ff packages but it is unclear 
> how I can use these packages with tiffs. Can anyone please suggest something? 
> I have tried tiff and rtiff libraries.
> Thanks in advance.
> [[alternative HTML version deleted]]
>
> __
> R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Getting an unexpected extra row when merging two dataframes

2017-03-30 Thread jim holtman
you need to show what 'str' shows for the data structure

Jim Holtman
Data Munger Guru

What is the problem that you are trying to solve?
Tell me what you want to do, not how you want to do it.


On Thu, Mar 30, 2017 at 12:08 AM, paulberna...@gmail.com
<paulberna...@gmail.com> wrote:
> Dear Jim,
>
> Thank you for your kind reply. However I forgot to tell you that the data
> was actually read from a Microsoft SQL Server database, so I used a select
> statement to read (import) it.
>
> I am working with the R script module of Microsoft Azure Machine Learning
> Studio, adn I used an sql connection to read in the table.
>
> That being said, how can I do to fix tve issue?
>
> Best regards,
>
> Paul
>
>
>
>  Mensaje original 
> Asunto: Re: [R] Getting an unexpected extra row when merging two dataframes
> De: jim holtman
> Para: Paul Bernal
> CC: r-help@r-project.org
>
>
> first of all when you read the data in you get 379 rows of data since
> you did not say 'header = TRUE' in the read.table. Here is what the
> first 6 lines of you data are:
>
>> dataset1 <- read.table('/users/jh52822/downloads/containertestdata.txt')
>>
>> str(dataset1)
> 'data.frame': 379 obs. of 2 variables:
> $ V1: Factor w/ 379 levels "1-Apr-00","1-Apr-01",..: 379 333 301 80
> 145 113 239 18 270 207 ...
> $ V2: Factor w/ 66 levels "10","11","12",..: 66 46 57 5 39 48 40 61 10 18
> ...
>> View(dataset1)
>> head(dataset1)
> V1 V2
> 1 TransitDate Transits
> 2 1-Oct-85 55
> 3 1-Nov-85 66
> 4 1-Dec-85 14
> 5 1-Jan-86 48
> 6 1-Feb-86 57
>>
>
> You need to learn to use 'str' to look at the structure. So when you
> are converting the dates, you will get an NA because the first row has
> "TransitDate". Now if you had used 'header = TRUE', you data would
> look like this:
>
>> dataset1 <- read.table('/users/jh52822/downloads/containertestdata.txt',
> + header = TRUE,
> + as.is = TRUE # prevent conversion to factors
> + )
>>
>> str(dataset1)
> 'data.frame': 378 obs. of 2 variables:
> $ TransitDate: chr "1-Oct-85" "1-Nov-85" "1-Dec-85" "1-Jan-86" ...
> $ Transits : int 55 66 14 48 57 49 70 19 27 28 ...
>> head(dataset1)
> TransitDate Transits
> 1 1-Oct-85 55
> 2 1-Nov-85 66
> 3 1-Dec-85 14
> 4 1-Jan-86 48
> 5 1-Feb-86 57
> 6 1-Mar-86 49
>>
>
> So try again.
>
> Jim Holtman
> Data Munger Guru
>
> What is the problem that you are trying to solve?
> Tell me what you want to do, not how you want to do it.
>
>
> On Wed, Mar 29, 2017 at 11:02 AM, Paul Bernal wrote:
>> Hello everyone,
>>
>> Hope you are all doing great. So I have two datasets:
>>
>> -dataset1Frame: which contains the historical number of transits from
>> october 1st, 1985 up to march 1, 2017. It has two columns, one called
>> TransitDate and the other called Transits. dataset1Frame is a table
>> comming
>> from an SQL Server Database.
>>
>> -TransitDateFrame: a made up dataframe that goes from october 1st, 1985 up
>> to the last date available in dataset1Frame.
>>
>> Note: The reason why I made up TransitDataFrame is because, since
>> sometimes
>> dataset1Frame has missing observations (some dates do not exist), and I
>> just want to make sure I have all the dates available from october 1, 1985
>> up to the last available observation.
>> The idea is to leave the transits that do exist as they come, and add the
>> dates missing as aditional rows (observations) with a value of NA for the
>> transits.
>>
>> That being said, here is the code:
>>
>>>install.packages("src/lubridate_1.6.0.zip", lib=".", repos=NULL,
>> verbose=TRUE)
>>>library(lubridate, lib.loc=".", verbose=TRUE)
>>>library(forecast)
>>>library(tseries)
>>>library(stats)
>>>library(stats4)
>>
>>>dataset1 <-read.table("CONTAINERTESTDATA.txt")
>>
>>
>>>dataset1Frame<-data.frame(dataset1)
>>
>>>dataset1Frame$TransitDate<-as.Date(dataset1Frame$TransitDate, "%Y-%m-%d")
>>
>>>TransitDate<-seq(as.Date("1985-10-01"),
>> as.Date(dataset1Frame[nrow(dataset1Frame),1]), "months")
>>
>>>TransitDate["Transits"]<-NA
>>
>>>TransitDateFrame<-data.frame(TransitDate)
>>
>>>NewTransitsFrame<-merge(dataset1Frame,TransitDateFrame, all.y=TRUE)
>>
>> #Output from resulting dataframes
>>
>>>TransitDateFrame
>>

Re: [R] Getting an unexpected extra row when merging two dataframes

2017-03-29 Thread jim holtman
first of all when you read the data in you get 379 rows of data since
you did not say 'header = TRUE' in the read.table.  Here is what the
first 6 lines of you data are:

> dataset1 <- read.table('/users/jh52822/downloads/containertestdata.txt')
>
> str(dataset1)
'data.frame':   379 obs. of  2 variables:
 $ V1: Factor w/ 379 levels "1-Apr-00","1-Apr-01",..: 379 333 301 80
145 113 239 18 270 207 ...
 $ V2: Factor w/ 66 levels "10","11","12",..: 66 46 57 5 39 48 40 61 10 18 ...
> View(dataset1)
> head(dataset1)
   V1   V2
1 TransitDate Transits
21-Oct-85   55
31-Nov-85   66
41-Dec-85   14
51-Jan-86   48
61-Feb-86   57
>

You need to learn to use 'str' to look at the structure.  So when you
are converting the dates, you will get an NA because the first row has
"TransitDate".  Now if you had used 'header = TRUE', you data would
look like this:

> dataset1 <- read.table('/users/jh52822/downloads/containertestdata.txt',
+ header = TRUE,
+ as.is = TRUE  # prevent conversion to factors
+ )
>
> str(dataset1)
'data.frame':   378 obs. of  2 variables:
 $ TransitDate: chr  "1-Oct-85" "1-Nov-85" "1-Dec-85" "1-Jan-86" ...
 $ Transits   : int  55 66 14 48 57 49 70 19 27 28 ...
> head(dataset1)
  TransitDate Transits
11-Oct-85   55
21-Nov-85   66
31-Dec-85   14
41-Jan-86   48
51-Feb-86   57
61-Mar-86   49
>

So try again.

Jim Holtman
Data Munger Guru

What is the problem that you are trying to solve?
Tell me what you want to do, not how you want to do it.


On Wed, Mar 29, 2017 at 11:02 AM, Paul Bernal <paulberna...@gmail.com> wrote:
> Hello everyone,
>
> Hope you are all doing great. So I have two datasets:
>
> -dataset1Frame: which contains the historical number of transits from
> october 1st, 1985 up to march 1, 2017. It has two columns, one called
> TransitDate and the other called Transits. dataset1Frame is a table comming
> from an SQL Server Database.
>
> -TransitDateFrame: a made up dataframe that goes from october 1st, 1985 up
> to the last date available in dataset1Frame.
>
> Note: The reason why I made up TransitDataFrame is because, since sometimes
> dataset1Frame has missing observations (some dates do not exist), and I
> just want to make sure I have all the dates available from october 1, 1985
> up to the last available observation.
> The idea is to leave the transits that do exist as they come, and add the
> dates missing as aditional rows (observations) with a value of NA for the
> transits.
>
> That being said, here is the code:
>
>>install.packages("src/lubridate_1.6.0.zip", lib=".", repos=NULL,
> verbose=TRUE)
>>library(lubridate, lib.loc=".", verbose=TRUE)
>>library(forecast)
>>library(tseries)
>>library(stats)
>>library(stats4)
>
>>dataset1 <-read.table("CONTAINERTESTDATA.txt")
>
>
>>dataset1Frame<-data.frame(dataset1)
>
>>dataset1Frame$TransitDate<-as.Date(dataset1Frame$TransitDate, "%Y-%m-%d")
>
>>TransitDate<-seq(as.Date("1985-10-01"),
> as.Date(dataset1Frame[nrow(dataset1Frame),1]), "months")
>
>>TransitDate["Transits"]<-NA
>
>>TransitDateFrame<-data.frame(TransitDate)
>
>>NewTransitsFrame<-merge(dataset1Frame,TransitDateFrame, all.y=TRUE)
>
> #Output from resulting dataframes
>
>>TransitDateFrame
>
>>NewTransitsFrame
>
> Why is there an additional row(observation) with a value of NA if I
> specified that the dataframe should only go to the last observation? There
> should be 378 observations at the end and I get 379 observations instead.
>
> The reason I am doing it this way is because this is how I got to fill in
> the gaps in dates (whenever there are nonexistent observations/missing
> data).
>
> Any guidance will be greatly appreciated.
>
> I am attaching a .txt file as a reference,
>
> Best regards,
>
> Paul
>
> __
> R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Display data by condition

2017-03-16 Thread jim holtman
you are probably missing a comma:

View(data[data$fact > 5000, ])


Jim Holtman
Data Munger Guru

What is the problem that you are trying to solve?
Tell me what you want to do, not how you want to do it.

On Thu, Mar 16, 2017 at 11:16 AM, Juan Ceccarelli Arias <jfca...@gmail.com>
wrote:

> Hello,
> I need to show the observations of a data set only if the earn more than
> $5000 (fact is its name in the date set). I use this:
>
> View(data[data$fact>5000])
>
> The code above shows nothing. No error or message at all.
> What am i doing wrong?
> Thanks for your help and time.
>
> [[alternative HTML version deleted]]
>
> __
> R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/
> posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Beginner needs help with R

2017-02-06 Thread jim holtman
You need the leading zeros, and 'numerics' just give the number without
leading zeros.  You can use 'sprintf' for create a character string with
the leading zeros:

> # this is using 'numeric' and drops leading zeros
>
> seq1 <- paste("DQ", seq(060054, 060060), sep = "")
> seq1
[1] "DQ60054" "DQ60055" "DQ60056" "DQ60057" "DQ60058" "DQ60059" "DQ60060"
>
> # use 'sprintf' to create leading zeros
> seq2 <- paste0("DQ", sprintf("%06d", seq(060054, 060060)))
> seq2
[1] "DQ060054" "DQ060055" "DQ060056" "DQ060057" "DQ060058" "DQ060059"
"DQ060060"
>


Jim Holtman
Data Munger Guru

What is the problem that you are trying to solve?
Tell me what you want to do, not how you want to do it.

On Sun, Feb 5, 2017 at 8:50 PM, Nabila Arbi <nabilaelarbi1...@gmail.com>
wrote:

> Dear R-Help Team!
>
> I have some trouble with R. It's probably nothing big, but I can't find a
> solution.
> My problem is the following:
> I am trying to download some sequences from ncbi using the ape package.
>
> seq1 <- paste("DQ", seq(060054, 060060), sep = "")
>
> sequences <- read.GenBank(seq1,
> seq.names = seq1,
> species.names = TRUE,
> gene.names = FALSE,
> as.character = TRUE)
>
> write.dna(sequences, "mysequences.fas", format = "fasta")
>
> My problem is, that R doesn't take the whole sequence number as "060054"
> but it puts it as DQ60054 (missing the zero in the beginning, which is
> essential).
>
> Could please tell me, how I can get R to accepting the zero in the
> beginning of the accession number?
>
> Thank you very much in advance and all the best!
>
> Nabila
>
> [[alternative HTML version deleted]]
>
> __
> R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/
> posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Source into a specified environment

2017-01-09 Thread jim holtman
?sys.source

Here is an example of the way I use it:

# read my functions into a environment
.my.env.jph <- new.env()
.sys.source('~/C_Drive/perf/bin/perfmon.r', envir=.my.env.jph)
attach(.my.env.jph)


Jim Holtman
Data Munger Guru

What is the problem that you are trying to solve?
Tell me what you want to do, not how you want to do it.

On Mon, Jan 9, 2017 at 11:21 AM, <g.maub...@weinwolf.de> wrote:

> Hi All,
>
> I wish everyone a happy new year.
>
> I have the following code:
>
> -- cut --
>
> modules <- c("t_calculate_RFM_model.R", "t_count_na.R",
> "t_export_table_2_xls.R",
>  "t_find_duplicates_in_variable.R",
> "t_find_originals_and_duplicates.R",
>  "t_frequencies.R", "t_inspect_dataset.R",
> "t_merge_variables.R",
>  "t_openxlsx_shortcuts.r", "t_rename_variables.R",
> "t_select_chunks.R")
>
> toolbox <- new.env(parent = emptyenv())
>
> for (file in modules)
> {
>   source(file = file.path(
> c_path_full$modules,  # path to modules
> file),
> echo = TRUE)
> }
>
> -- cut --
>
> I would like to know how I can source the modules into the newly created
> environment called "toolbox"?
>
> I had a look at the help file for ?source but this function can read in
> only in the current environment or the global environment (= default).
>
> I tried also the following
>
> -- cut --
>
> for (file in modules))
> {
>   do.call(
> what = "source",
> args = list(
>   file = file.path(c_path_full$modules,
>file),
>   echo = TRUE
> ),
> envir = toolbox
>   )
> }
>
> -- cut --
>
> But this did not work, i. e. it did not load the modules into the
> environment "toolbox" but into the .GlobalEnv.
>
> I also had a look at "assign", but assign() askes for a name of an object
> in quotes. This way I could not figure out how to use it in a loop or
> function to name the element in "toolbox" after the modules names:
>
> assign("t_add_sheet", t_add_sheet, envir = toolbox)  # works
> assign(quote(t_add_sheet), t_add_sheet, envir = toolbox)  # does NOT work
> assign(as.name(t_add_sheet), t_add_sheet, envir = toolbix)  # does NOT
> work
>
>
> Is there a way to load the modules directly into the "toolbox"
> environment?
>
> Kind regards
>
> Georg
>
> __
> R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/
> posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] re attach into the killed R session

2016-12-18 Thread jim holtman
I would hope that if you have an R script that is running for a month that
you have built in periodic checkpoints so that you can recover what is
happening.  In cases where I want to be able to restart an R script at some
point downstream, I will "save.image", or just the objects that are
important, and then I can reload at that point and carry forward.  Also you
will have the objects that you need to examine if you have too.  If I had
something running that long, I would at least take a checkpoint every hour
to help in the debugging/recovery process.


Jim Holtman
Data Munger Guru

What is the problem that you are trying to solve?
Tell me what you want to do, not how you want to do it.

On Sun, Dec 18, 2016 at 2:22 PM, Ragia . <ragi...@hotmail.com> wrote:

>
> Dear group
> I had a tmux session, on it an R script is running before the program
> should ends
>
> on the screen written "killed" and the script terminated and returned bake
> to bash (in the same tmux window)
>
> Q: how can I re attach into the killed R session and check it? can I
> recover what was the script doing for more than a month now ?
> any ideas?
> THANS
>
>
> Ragia A. Ibrahim
>
>
>
> [[alternative HTML version deleted]]
>
> __
> R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/
> posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Question about proxy setting of R

2016-12-05 Thread jim holtman
You will probably have to check with your network folks to see what is
possible on your system.


Jim Holtman
Data Munger Guru

What is the problem that you are trying to solve?
Tell me what you want to do, not how you want to do it.

On Mon, Dec 5, 2016 at 6:33 AM, <qwertyui_per...@yahoo.co.jp> wrote:

> Dear Jim,
>
> Thanks to your advice, "Proxy Authentification" window showed up, however,
> I couldn't access to the internet. Error messages are as below.
>
> -- -- ---
> > update.packages(ask='graphics',checkBuilt=TRUE)
> --- Please select a CRAN mirror for use in this session ---
> Warning: failed to download mirrors file (scheme not supported in URL
> 'https://cran.r-project.org/CRAN_mirrors.csv'); using local file
> 'C:/PROGRA~1/R/R-33~1.2/doc/CRAN_mirrors.csv'
> Warning: unable to access index for repository
> https://cran.ism.ac.jp/src/contrib:
>   scheme not supported in URL 'https://cran.ism.ac.jp/src/contrib/PACKAGES'
>
> Proxy authentication failed:
> please re-enter the credentials or hit Cancel
> -- -- ---
>
> I assume the proxy server is only available for "http", not "https".
> What should I do ?
>
> J J
>
>
> - Original Message -
> *From:* jim holtman <jholt...@gmail.com>
> *To:* qwertyui_per...@yahoo.co.jp
> *Date:* 2016/12/2, Fri 09:13
> *Subject:* Re: [R] Question about proxy setting of R
>
> Try this option:
>
>  options(download.file.method = "internal")
>
>
>
>
>
> Jim Holtman
> Data Munger Guru
>
> What is the problem that you are trying to solve?
> Tell me what you want to do, not how you want to do it.
>
> On Wed, Nov 30, 2016 at 10:37 PM, <qwertyui_per...@yahoo.co.jp> wrote:
>
> Hello,
>
> I use R 3.0.2 on Win 7 through proxy server using ".Rprofile" in home
> directory that includes "Sys.setenv(http_proxy=proxy_ server:port)".
> There has been no problem to access the internet for some years.
> In this situation, I installed R 3.3.1 and then entered "update.packages
> ()", however, "Proxy Authentification" window didn't show up and
> failed to access the internet. Error messages are as below.
>
> -- -- ---
> > update.packages(ask='graphics' ,checkBuilt=TRUE)
> --- Please select a CRAN mirror for use in this session ---
> Warning: failed to download mirrors file (cannot open URL
> 'https://cran.r-project.org/CRAN_mirrors.csv'); using local file
> 'C:/PROGRA~1/R/R-33~1.2/doc/ CRAN_mirrors.csv'
> Warning: unable to access index for repository
> https://cran.ism.ac.jp/src/contrib:
>   cannot open URL 'https://cran.ism.ac.jp/src/contrib/PACKAGES'
> Warning: unable to access index for repository
> http://www.stats.ox.ac.uk/pub/RWin/src/contrib:
>   cannot open URL 'http://www.stats.ox.ac.uk/pub/RWin/src/contrib/PACKAGES
> '
> Warning message:
> In download.file(url, destfile = f, quiet = TRUE) :
>   cannot open URL 'https://cran.r-project.org/CRAN_mirrors.csv': HTTP
> status was '407 Proxy Authentication Required'
>
> -- -- ---
>
> Strange to say, R 3.0.2 is able to access to the internet, and R 3.3.1
> shows collect proxy setting in ".Rprofile"  by "Sys.getenv("http_proxy")"
> From internet information, I added "http_proxy_user=ask" to ".Rprofile", or
> " --internet2" to the desktop icon of R 3.3.1, ending up in the same
> result.
>
> Please show me the way of proxy setting of R 3.3.1.
>
> __ 
> R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.r-project.org/
> posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>
>
>
>
>

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] data

2016-12-03 Thread jim holtman
This should be reasonably efficient with 'dplyr':

> library(dplyr)
> input <- read.csv(text = "state,city,x
+ 1,12,100
+ 1,12,100
+ 1,12,200
+ 1,13,200
+ 1,13,100
+ 1,13,100
+ 1,14,200
+ 2,21,200
+ 2,21,200
+ 2,21,100
+ 2,23,100
+ 2,23,200
+ 2,34,200
+ 2,34,100
+ 2,35,100")
>
> result <- input %>%
+ group_by(state) %>%
+ summarise(nCities = length(unique(city)),
+ count = n(),
+ `100's` = sum(x == 100),
+ `200's` = sum(x == 200)
+ )
> result
# A tibble: 2 × 5
  state nCities count `100's` `200's`

1 1   3 7   4   3
2 2   4 8   4   4


Or you can also use data.table:

> library(data.table)
> input <- fread("state,city,x
+ 1,12,100
+ 1,12,100
+ 1,12,200
+ 1,13,200
+ 1,13,100
+ 1,13,100
+ 1,14,200
+ 2,21,200
+ 2,21,200
+ 2,21,100
+ 2,23,100
+ 2,23,200
+ 2,34,200
+ 2,34,100
+ 2,35,100")
>
> input[, .(nCities = length(unique(city)),
+   count = .N,
+   `100's` = sum(x == 100),
+   `200's` = sum(x == 200)
+   )
+ , keyby = state
+ ]
   state nCities count 100's 200's
1: 1       3 7 4 3
2: 2   4 8 4 4



Jim Holtman
Data Munger Guru

What is the problem that you are trying to solve?
Tell me what you want to do, not how you want to do it.

On Sat, Dec 3, 2016 at 10:40 AM, Val <valkr...@gmail.com> wrote:

> Hi all,
>
> I am trying to read and summarize  a big data frame( >10M records)
>
> Here is the sample of my data
> state,city,x
> 1,12,100
> 1,12,100
> 1,12,200
> 1,13,200
> 1,13,100
> 1,13,100
> 1,14,200
> 2,21,200
> 2,21,200
> 2,21,100
> 2,23,100
> 2,23,200
> 2,34,200
> 2,34,100
> 2,35,100
>
> I want  get  the total count by state, and the  the number of cities
> by state. The x variable is either 100 or 200 and count each
>
> The result should look like as follows.
>
> state,city,count,100's,200's
> 1,3,7,4,3
> 2,4,8,4,4
>
> At the present I am doing it  in several steps and taking too long
>
> Is there an efficient way of doing this?
>
> __
> R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/
> posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] About data manipulation

2016-11-26 Thread jim holtman
just assign it to an object

x<- DT .


Jim Holtman
Data Munger Guru

What is the problem that you are trying to solve?
Tell me what you want to do, not how you want to do it.

On Sun, Nov 27, 2016 at 2:03 AM, lily li <chocol...@gmail.com> wrote:

> Thanks Jim, this method is very convenient and is what I want. Could I
> know how to save the resulted dataframe? It printed in the console directly.
>
> On Sat, Nov 26, 2016 at 5:55 PM, jim holtman <jholt...@gmail.com> wrote:
>
>> You did not provide any data, but I will take a stab at it using the
>> "dplyr" package
>>
>> library(dplyr)
>> DT %>%
>> group_by(month, id, note) %>%
>> summarise(avg = mean(total))
>>
>>
>>
>> Jim Holtman
>> Data Munger Guru
>>
>> What is the problem that you are trying to solve?
>> Tell me what you want to do, not how you want to do it.
>>
>> On Sat, Nov 26, 2016 at 11:11 AM, lily li <chocol...@gmail.com> wrote:
>>
>>> Hi R users,
>>>
>>> I'm trying to manipulate a dataframe and have some difficulties.
>>>
>>> The original dataset is like this:
>>>
>>> DF
>>> year   month   total   id note
>>> 2000 1 98GA   1
>>> 2001 1100   GA   1
>>> 2002 2 99GA   1
>>> 2002 2 80GB   1
>>> ...
>>> 2012 1 78GA   2
>>> ...
>>>
>>> The structure is like this: when year is between 2000-2005, note is 1;
>>> when
>>> year is between 2006-2010, note is 2; GA, GB, etc represent different
>>> groups, but they all have years 2000-2005, 2006-2010, 2011-2015.
>>> I want to calculate one average value for each month in each time slice.
>>> For example, between 2000-2005, when note is 1, for GA, there is one
>>> value
>>> in month 1, one value in month 2, etc; for GB, there is one value in
>>> month
>>> 1, one value in month 2, between this time period. So later, there is no
>>> 'year' column, but other columns.
>>> I tried the script: DF_GA = aggregate(total~year+month,data=subset(DF,
>>> id==GA==1)), but it did not give me the ideal dataframe. How to do
>>> then?
>>> Thanks for your help.
>>>
>>> [[alternative HTML version deleted]]
>>>
>>> __
>>> R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
>>> https://stat.ethz.ch/mailman/listinfo/r-help
>>> PLEASE do read the posting guide http://www.R-project.org/posti
>>> ng-guide.html
>>> and provide commented, minimal, self-contained, reproducible code.
>>>
>>
>>
>

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] About data manipulation

2016-11-26 Thread jim holtman
You did not provide any data, but I will take a stab at it using the
"dplyr" package

library(dplyr)
DT %>%
group_by(month, id, note) %>%
summarise(avg = mean(total))



Jim Holtman
Data Munger Guru

What is the problem that you are trying to solve?
Tell me what you want to do, not how you want to do it.

On Sat, Nov 26, 2016 at 11:11 AM, lily li <chocol...@gmail.com> wrote:

> Hi R users,
>
> I'm trying to manipulate a dataframe and have some difficulties.
>
> The original dataset is like this:
>
> DF
> year   month   total   id note
> 2000 1 98GA   1
> 2001 1100   GA   1
> 2002 2 99GA   1
> 2002 2 80GB   1
> ...
> 2012 1 78GA   2
> ...
>
> The structure is like this: when year is between 2000-2005, note is 1; when
> year is between 2006-2010, note is 2; GA, GB, etc represent different
> groups, but they all have years 2000-2005, 2006-2010, 2011-2015.
> I want to calculate one average value for each month in each time slice.
> For example, between 2000-2005, when note is 1, for GA, there is one value
> in month 1, one value in month 2, etc; for GB, there is one value in month
> 1, one value in month 2, between this time period. So later, there is no
> 'year' column, but other columns.
> I tried the script: DF_GA = aggregate(total~year+month,data=subset(DF,
> id==GA==1)), but it did not give me the ideal dataframe. How to do
> then?
> Thanks for your help.
>
> [[alternative HTML version deleted]]
>
> __
> R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/
> posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] The code itself disappears after starting to execute the for loop

2016-11-03 Thread jim holtman
A little more information would help.  How exactly are out creating the
output to the console?  Are you using 'print', 'cat' or something else?  Do
you have buffered output checked on the GUI (you probably don't want it
checked or you output will be delayed till the buffer is full -- this might
be the cause of your problem.


Jim Holtman
Data Munger Guru

What is the problem that you are trying to solve?
Tell me what you want to do, not how you want to do it.

On Thu, Nov 3, 2016 at 1:55 PM, Maram SAlem <marammagdysa...@gmail.com>
wrote:

> Hi all,
>
> I've a question concerning the R 3.3.1 version. I have a long code that I
> used to run on versions earlier to the 3.3.1 version, and when I copied the
> code to the R console, I can still see the code while the loop is executing
> , along with the output printed after each iteration of the loop.
>
> Now, on the 3.3.1 version, after I copy the code to the console, it
> disappears and I only see the printed output of only one iteration at a
> time, that is, after the first iteration the printed output disappears (
> though it's only 6 lines, just giving me some guidance, not a long output).
> This is causing me some problems, so I don't know if there is a general
> option for R that enables me to still see the code and the output of all
> the iterations till the loop is over, as was the case with earlier R
> versions.
>
> I didn't include the code as it's a long one.
>
> Thanks a lot in advance,
>
> Maram
>
>
> Sent from my iPhone
> __
> R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/
> posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] How to copy and paste a row at the end of each group of a table?

2016-10-31 Thread jim holtman
try this:

> dat<-structure(list(site = c(1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 2,
+ 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2), day = c(1, 31, 61, 91, 121,
+ 151, 181, 211, 241, 271, 301, 331, 1, 31, 61, 91, 121, 151, 181,
+ 211, 241, 271, 301, 331), temp = c(8.3, 10.3, 9.4, 6.1, 3, 1.3,
+ 1, 0.8, 1, 1.4, 2.7, 5.1, 9, 11.2, 9.6, 5.7, 2, 0.8, 0.6, 0.4,
+ 0.4, 0.6, 1.5, 4.5)), .Names = c("site", "day", "temp"), row.names = c(NA,
+ -24L), class = "data.frame")
>
> # split the data, copy value of first row to end
> dat2 <- do.call(rbind, lapply(split(dat, dat$site), function(.site){
+ x <- rbind(.site, .site[1L, ])  # add first row to bottom
+ x$day[nrow(x)] <- 361
+ x  # return x
+ }))
> dat2
  site day temp
1.1  1   1  8.3
1.2  1  31 10.3
1.3  1  61  9.4
1.4  1  91  6.1
1.5  1 121  3.0
1.6  1 151  1.3
1.7  1 181  1.0
1.8  1 211  0.8
1.9  1 241  1.0
1.10 1 271  1.4
1.11 1 301  2.7
1.12 1 331  5.1
1.13 1 361  8.3
2.13 2   1  9.0
2.14 2  31 11.2
2.15 2  61  9.6
2.16 2  91  5.7
2.17 2 121  2.0
2.18 2 151  0.8
2.19 2 181  0.6
2.20 2 211  0.4
2.21 2 241  0.4
2.22 2 271  0.6
2.23 2 301  1.5
2.24 2 331  4.5
2.1312 361  9.0

Jim Holtman
Data Munger Guru

What is the problem that you are trying to solve?
Tell me what you want to do, not how you want to do it.


On Mon, Oct 31, 2016 at 8:59 AM, Kristi Glover
<kristi.glo...@hotmail.com> wrote:
> Hi R Users,
>
> I have a big table with many classes, I need to copy a fist row of each class 
> and put at the end of each class.
>
> I split the table but I wonder of copying and putting at the end of the 
> class. Here is an example. I was trying to get the table from "dat" to 
> "dat1". In "dat", there is a column with "day", has only 12 rows for each 
> class. but I want to copy first row of each class and put on the bottom of 
> the class (see table: "dat1"), the value of the column of "day" need to be 
> "361" but the value of temp should be same as of row 1.
>
>
> Thanks for your suggestions in advance.
>
>
> Thanks,
>
> dat<-structure(list(site = c(1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 2,
> 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2), day = c(1, 31, 61, 91, 121,
> 151, 181, 211, 241, 271, 301, 331, 1, 31, 61, 91, 121, 151, 181,
> 211, 241, 271, 301, 331), temp = c(8.3, 10.3, 9.4, 6.1, 3, 1.3,
> 1, 0.8, 1, 1.4, 2.7, 5.1, 9, 11.2, 9.6, 5.7, 2, 0.8, 0.6, 0.4,
> 0.4, 0.6, 1.5, 4.5)), .Names = c("site", "day", "temp"), row.names = c(NA,
> -24L), class = "data.frame")
>
> dat1<-structure(list(site = c(1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1,
> 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2), day = c(1, 31, 61, 91,
> 121, 151, 181, 211, 241, 271, 301, 331, 361, 1, 31, 61, 91, 121,
> 151, 181, 211, 241, 271, 301, 331, 361), temp = c(8.3, 10.3,
> 9.4, 6.1, 3, 1.3, 1, 0.8, 1, 1.4, 2.7, 5.1, 8.3, 9, 11.2, 9.6,
> 5.7, 2, 0.8, 0.6, 0.4, 0.4, 0.6, 1.5, 4.5, 9)), .Names = c("site",
> "day", "temp"), row.names = c(NA, -26L), class = "data.frame")
>
>
>
> [[alternative HTML version deleted]]
>
> __
> R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] difference

2016-10-28 Thread jim holtman
I read the problem incorrectly; I did not see that you wanted the
difference from the first entry; trying again:

> require(dplyr)
> input <- read.table(text = "Year   Num
+ 200125
+ 200175
+ 2001   150
+ 200230
+ 200285
+ 200295", header = TRUE)
>
> input %>%
+ group_by(Year) %>%
+ mutate(diff = Num - Num[1L])
Source: local data frame [6 x 3]
Groups: Year [2]

   Year   Num  diff

1  200125 0
2  20017550
3  2001   150   125
4  200230 0
5  20028555
6  20029565
>
> # use data.table
> require(data.table)
> setDT(input)  # convert to data.table
> input[, diff := Num - Num[1L], by = Year][]  # print output
   Year Num diff
1: 2001  250
2: 2001  75   50
3: 2001 150  125
4: 2002  300
5: 2002  85   55
6: 2002  95   65

Jim Holtman
Data Munger Guru

What is the problem that you are trying to solve?
Tell me what you want to do, not how you want to do it.


On Fri, Oct 28, 2016 at 12:20 AM, Ashta <sewa...@gmail.com> wrote:
> Hi all,
>
> I want to calculate the difference  between successive row values to
> the first row value within year.
> How do I get that?
>
>  Here isthe sample of data
> Year   Num
> 200125
> 200175
> 2001   150
> 200230
> 200285
> 200295
>
> Desired output
> Year   Num  diff
> 200125   0
> 200175  50
> 2001  150125
> 2002300
> 200285  55
> 200295  65
>
> Thank you.
>
> __
> R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] difference

2016-10-28 Thread jim holtman
Here are a couple of other ways using 'dplyr' and 'data.table'

> require(dplyr)
> input <- read.table(text = "Year   Num
+ 200125
+ 200175
+ 2001   150
+ 200230
+ 200285
+ 200295", header = TRUE)
>
> input %>%
+ group_by(Year) %>%
+ mutate(diff = c(0, diff(Num)))
Source: local data frame [6 x 3]
Groups: Year [2]

   Year   Num  diff

1  200125 0
2  20017550
3  2001   15075
4  200230 0
5  20028555
6  20029510
>
> # use data.table
> require(data.table)
Loading required package: data.table
data.table 1.9.6  For help type ?data.table or
https://github.com/Rdatatable/data.table/wiki
The fastest way to learn (by data.table authors):
https://www.datacamp.com/courses/data-analysis-the-data-table-way
---
data.table + dplyr code now lives in dtplyr.
Please library(dtplyr)!
---

Attaching package: ‘data.table’

The following objects are masked from ‘package:dplyr’:

between, last

> setDT(input)  # convert to data.table
> input[, diff := c(0, diff(Num)), by = Year][]  # print output
   Year Num diff
1: 2001  250
2: 2001  75   50
3: 2001 150   75
4: 2002  30    0
5: 2002  85   55
6: 2002  95   10
>

Jim Holtman
Data Munger Guru

What is the problem that you are trying to solve?
Tell me what you want to do, not how you want to do it.


On Fri, Oct 28, 2016 at 12:20 AM, Ashta <sewa...@gmail.com> wrote:
> Hi all,
>
> I want to calculate the difference  between successive row values to
> the first row value within year.
> How do I get that?
>
>  Here isthe sample of data
> Year   Num
> 200125
> 200175
> 2001   150
> 200230
> 200285
> 200295
>
> Desired output
> Year   Num  diff
> 200125   0
> 200175  50
> 2001  150125
> 2002300
> 200285  55
> 200295  65
>
> Thank you.
>
> __
> R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Reg : R : How to capture cpu usage, memory usage and disks info using R language

2016-10-17 Thread jim holtman
within the VBS script you can easily access remote computers.

Jim Holtman
Data Munger Guru

What is the problem that you are trying to solve?
Tell me what you want to do, not how you want to do it.


On Mon, Oct 17, 2016 at 5:58 AM, Manohar Reddy <manu.redd...@gmail.com> wrote:
> Thanks Jim.
>
>
>
>Actually my requirement is I have ~ 20 servers which are running on
> windows server family OS,if I want to check any server cpu usge,memory usage
> or disk info I need to log into every server instead of doing like this if I
> get that kind of information using R then I can save it in some RDBMS
> database then I will populate this live data on some dashboard like which
> are made by R using shiny,so that I can view/get the all the information on
> single page.
>
>
>
>Here for me challenging work is how to capture cpu,memory,disk info using
> R .
>
>
> On Sun, Oct 16, 2016 at 8:37 PM, jim holtman <jholt...@gmail.com> wrote:
>>
>> Here is a start on the solution.  This will create a VBS script that
>> will gather the CPU data and return it in a character vector that you
>> can extract the data from.  You can add to it to get the other data
>> you are looking for.
>>
>> 
>> > temp <- tempfile(fileext = '.vbs')  # get a temp file
>> >
>> > # create the VBS file to collect processor data
>> > writeLines('Set objWMIService =
>> > GetObject("winmgmts:localhost\\root\\CIMV2")
>> + Set CPUInfo = objWMIService.ExecQuery("SELECT * FROM
>> Win32_PerfFormattedData_PerfOS_Processor",,48)
>> + For Each Item in CPUInfo
>> + Wscript.Echo "PercentProcessorTime: " & Item.PercentProcessorTime &
>> _
>> +  "  processor:" & Item.Name
>> + Next',
>> +  temp)
>> >
>> > results <- shell(paste("cscript", temp), intern = TRUE)  # execute using
>> > 'cscript'
>> > results # all the data
>> [1] "Microsoft (R) Windows Script Host Version 5.8"
>> [2] "Copyright (C) Microsoft Corporation. All rights reserved."
>> [3] ""
>> [4] "PercentProcessorTime: 18  processor:0"
>> [5] "PercentProcessorTime: 6  processor:1"
>> [6] "PercentProcessorTime: 6  processor:2"
>> [7] "PercentProcessorTime: 0  processor:3"
>> [8] "PercentProcessorTime: 7  processor:_Total"
>> > grep("processor:", results, value = TRUE)  # get just processor data
>> [1] "PercentProcessorTime: 18  processor:0" "PercentProcessorTime:
>> 6  processor:1"
>> [3] "PercentProcessorTime: 6  processor:2"  "PercentProcessorTime:
>> 0  processor:3"
>> [5] "PercentProcessorTime: 7  processor:_Total"
>> >
>> >
>> #
>>
>>
>> Jim Holtman
>> Data Munger Guru
>>
>> What is the problem that you are trying to solve?
>> Tell me what you want to do, not how you want to do it.
>>
>>
>> On Fri, Oct 14, 2016 at 5:37 AM, Manohar Reddy <manu.redd...@gmail.com>
>> wrote:
>> > Hi,
>> >
>> > Is there any possibility that we can capture cpu usage ,memory usage and
>> > disks info using R language on *windows family OS* ?
>> >
>> >
>> >
>> >   I would like to see data that’s looks like
>> > a
>> >  below
>> >
>> >
>> >
>> >Cpu usage : 70 %
>> >
>> >Memory usage  : 80 %
>> >
>> >Disks: C drive – 40 % full,D dive – 60 %,full E drive – 30 %
>> > full
>> >
>> >
>> >for more info please find the attachement.
>> >
>> >
>> >  Thanks in Advance ,Manu.
>> >
>> > __
>> > R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
>> > https://stat.ethz.ch/mailman/listinfo/r-help
>> > PLEASE do read the posting guide
>> > http://www.R-project.org/posting-guide.html
>> > and provide commented, minimal, self-contained, reproducible code.
>
>
>
>
> --
>
>
> Manu.

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Reg : R : How to capture cpu usage, memory usage and disks info using R language

2016-10-16 Thread jim holtman
Here is a start on the solution.  This will create a VBS script that
will gather the CPU data and return it in a character vector that you
can extract the data from.  You can add to it to get the other data
you are looking for.


> temp <- tempfile(fileext = '.vbs')  # get a temp file
>
> # create the VBS file to collect processor data
> writeLines('Set objWMIService = 
> GetObject("winmgmts:localhost\\root\\CIMV2")
+ Set CPUInfo = objWMIService.ExecQuery("SELECT * FROM
Win32_PerfFormattedData_PerfOS_Processor",,48)
+ For Each Item in CPUInfo
+ Wscript.Echo "PercentProcessorTime: " & Item.PercentProcessorTime & _
+  "  processor:" & Item.Name
+ Next',
+  temp)
>
> results <- shell(paste("cscript", temp), intern = TRUE)  # execute using 
> 'cscript'
> results # all the data
[1] "Microsoft (R) Windows Script Host Version 5.8"
[2] "Copyright (C) Microsoft Corporation. All rights reserved."
[3] ""
[4] "PercentProcessorTime: 18  processor:0"
[5] "PercentProcessorTime: 6  processor:1"
[6] "PercentProcessorTime: 6  processor:2"
[7] "PercentProcessorTime: 0  processor:3"
[8] "PercentProcessorTime: 7  processor:_Total"
> grep("processor:", results, value = TRUE)  # get just processor data
[1] "PercentProcessorTime: 18  processor:0" "PercentProcessorTime:
6  processor:1"
[3] "PercentProcessorTime: 6  processor:2"  "PercentProcessorTime:
0  processor:3"
[5] "PercentProcessorTime: 7  processor:_Total"
>
>
#


Jim Holtman
Data Munger Guru

What is the problem that you are trying to solve?
Tell me what you want to do, not how you want to do it.


On Fri, Oct 14, 2016 at 5:37 AM, Manohar Reddy <manu.redd...@gmail.com> wrote:
> Hi,
>
> Is there any possibility that we can capture cpu usage ,memory usage and
> disks info using R language on *windows family OS* ?
>
>
>
>   I would like to see data that’s looks like
> a
>  below
>
>
>
>Cpu usage : 70 %
>
>Memory usage  : 80 %
>
>Disks: C drive – 40 % full,D dive – 60 %,full E drive – 30 % full
>
>
>for more info please find the attachement.
>
>
>  Thanks in Advance ,Manu.
>
> __
> R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Problem with sample(...,size = 1000000000,...)

2016-10-15 Thread jim holtman
I forgot to add that if you have less than 16GB of memory, then you
were probably paging memory to disk and that would have take a much,
much, longer time.  When you are trying to do something BIG, do it in
some smaller steps and look at the resources that it takes (memory,
cpu, ...).

Jim Holtman
Data Munger Guru

What is the problem that you are trying to solve?
Tell me what you want to do, not how you want to do it.


On Sat, Oct 15, 2016 at 4:06 PM, jim holtman <jholt...@gmail.com> wrote:
> Do you realize you are trying to create a vector with 1 billion
> entries, so this will take some time.  How much memory do you have on
> your computer?
>
> Here are some times to generate increasing sample sizes.  I have 16GB
> on my computer and it took only 30 seconds to generate the data and
> used almost 12GB of memory.
>
>> system.time(x<-sample(1:5,10,TRUE,c(0.1,0.2,0.4,0.2,0.1)))
>user  system elapsed
>   0   0   0
>> system.time(x<-sample(1:5,100,TRUE,c(0.1,0.2,0.4,0.2,0.1)))
>user  system elapsed
>0.030.000.03
>> system.time(x<-sample(1:5,1000,TRUE,c(0.1,0.2,0.4,0.2,0.1)))
>user  system elapsed
>0.470.020.49
>> system.time(x<-sample(1:5,1,TRUE,c(0.1,0.2,0.4,0.2,0.1)))
>user  system elapsed
>3.090.243.33
>> system.time(x<-sample(1:5,10,TRUE,c(0.1,0.2,0.4,0.2,0.1)))
>user  system elapsed
>   30.761.70   32.92
>> memory.size()
> [1] 11502.52
>
> Jim Holtman
> Data Munger Guru
>
> What is the problem that you are trying to solve?
> Tell me what you want to do, not how you want to do it.
>
>
> On Sat, Oct 15, 2016 at 12:19 PM, Huy Nguyễn <quanghuy1...@gmail.com> wrote:
>> When I ran this code:
>> "
>> x<-sample(1:5,10,TRUE,c(0.1,0.2,0.4,0.2,0.1))
>> print(table(x)/10)
>> plot(table(x)/10,type="h",xlab="x",ylab="P(x)")
>> "
>> My laptop was frozen and didn't respond. Although I used ctrl+alt+del to
>> terminate r program, my laptop still did nothing. And I must restart my
>> laptop immediately or my laptop might be broken down.
>> Thus, I think in the future the program should have something to control
>> the memory and time when it is running and can be terminated if necessary.
>>
>> [[alternative HTML version deleted]]
>>
>> __
>> R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
>> https://stat.ethz.ch/mailman/listinfo/r-help
>> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
>> and provide commented, minimal, self-contained, reproducible code.

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Problem with sample(...,size = 1000000000,...)

2016-10-15 Thread jim holtman
Do you realize you are trying to create a vector with 1 billion
entries, so this will take some time.  How much memory do you have on
your computer?

Here are some times to generate increasing sample sizes.  I have 16GB
on my computer and it took only 30 seconds to generate the data and
used almost 12GB of memory.

> system.time(x<-sample(1:5,10,TRUE,c(0.1,0.2,0.4,0.2,0.1)))
   user  system elapsed
  0   0   0
> system.time(x<-sample(1:5,100,TRUE,c(0.1,0.2,0.4,0.2,0.1)))
   user  system elapsed
   0.030.000.03
> system.time(x<-sample(1:5,1000,TRUE,c(0.1,0.2,0.4,0.2,0.1)))
   user  system elapsed
   0.470.020.49
> system.time(x<-sample(1:5,1,TRUE,c(0.1,0.2,0.4,0.2,0.1)))
   user  system elapsed
   3.090.243.33
> system.time(x<-sample(1:5,10,TRUE,c(0.1,0.2,0.4,0.2,0.1)))
   user  system elapsed
  30.761.70   32.92
> memory.size()
[1] 11502.52

Jim Holtman
Data Munger Guru

What is the problem that you are trying to solve?
Tell me what you want to do, not how you want to do it.


On Sat, Oct 15, 2016 at 12:19 PM, Huy Nguyễn <quanghuy1...@gmail.com> wrote:
> When I ran this code:
> "
> x<-sample(1:5,10,TRUE,c(0.1,0.2,0.4,0.2,0.1))
> print(table(x)/10)
> plot(table(x)/10,type="h",xlab="x",ylab="P(x)")
> "
> My laptop was frozen and didn't respond. Although I used ctrl+alt+del to
> terminate r program, my laptop still did nothing. And I must restart my
> laptop immediately or my laptop might be broken down.
> Thus, I think in the future the program should have something to control
> the memory and time when it is running and can be terminated if necessary.
>
> [[alternative HTML version deleted]]
>
> __
> R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] lag, count

2016-10-15 Thread jim holtman
Here is a solution using 'dplyr'

> require(dplyr)
> lag<-read.table(text=" ID, y1, y2
+ 1,0,12/25/2014
+ 1,125,9/15/2015
+ 1,350,1/30/2016
+ 2,0,12/25/2012
+ 2,450,9/15/2014
+ 2,750,1/30/2016
+ 2,  656, 11/30/2016
+ ",sep=",",header=TRUE)
>
> new_lag <- lag %>%
+ mutate(y2 = as.Date(y2, format = "%m/%d/%Y")) %>%  # convert date
+ arrange(ID, y2) %>%  # sort if necessary
+ group_by(ID) %>%
+ mutate(flag = seq(n()),
+ y1diff = c(0, diff(y1)),
+ y2diff = c(0, diff(y2))
+ )
>
>
> new_lag
Source: local data frame [7 x 6]
Groups: ID [2]

 IDy1 y2  flag y1diff y2diff
 
1 1 0 2014-12-25 1  0  0
2 1   125 2015-09-15 2125264
3 1   350 2016-01-30 3225137
4 2 0 2012-12-25 1  0  0
5 2   450 2014-09-15 2450629
6 2   750 2016-01-30     3    300502
7 2   656 2016-11-30 4-94305

Jim Holtman
Data Munger Guru

What is the problem that you are trying to solve?
Tell me what you want to do, not how you want to do it.


On Sat, Oct 15, 2016 at 2:54 PM, Rui Barradas <ruipbarra...@sapo.pt> wrote:
> I forgot about the sorting part and assumed the data.frame was already
> sorted. If not, after converting y2 to class Date, you can do
>
> lag <- lag[order(lag$ID, lag$y2), ]
>
> Rui Barradas
>
>
> Em 15-10-2016 19:45, Rui Barradas escreveu:
>>
>> Hello,
>>
>> Try the following.
>>
>>
>> lag<-read.table(text=" ID, y1, y2
>> 1,0,12/25/2014
>> 1,125,9/15/2015
>> 1,350,1/30/2016
>> 2,0,12/25/2012
>> 2,450,9/15/2014
>> 2,750,1/30/2016
>> 2,  656, 11/30/2016
>> ",sep=",",header=TRUE)
>>
>> str(lag)
>> lag$y2 <- as.Date(lag$y2, format = "%m/%d/%Y")
>> str(lag)
>>
>> # 1)
>> flag <- ave(lag$ID, lag$ID, FUN = seq_along)
>> lag2 <- cbind(lag[1], flag, lag[-1])
>>
>> # 2)
>> y1dif <- ave(lag2$y1, lag2$ID, FUN = function(y) c(0, y[-1] -
>> y[-length(y)]))
>> y2dif <- unlist(tapply(lag2$y2, lag2$ID, FUN = function(y) c(0, y[-1] -
>> y[-length(y)])))
>>
>> lag2 <- cbind(lag2, y1dif, y2dif)
>> lag2
>>
>> Hope this helps,
>>
>> Rui Barradas
>>
>> Em 15-10-2016 17:57, Val escreveu:
>>>
>>> Hi all,
>>>
>>> I want sort the data by ID and Y2 then count the number of rows within
>>> IDs.  Assign a "flag" variable to reach row starting from first  to
>>> the last row.
>>> For instance, in the following data ID "1" has three rows   and each
>>> row is assigned flag sequentially 1, 2,3.
>>>
>>> 2. In the second step, within each ID, I want get the difference
>>> between the subsequent row values of y1 and y2(date) values.
>>> Within each ID the first value of y1diff  and y2diff are always 0. The
>>> second values for each will  be the current row minus the previous
>>> row.
>>>
>>>
>>>
>>> lag<-read.table(text=" ID, y1, y2
>>> ID,Y1,y2
>>> 1,0,12/25/2014
>>> 1,125,9/15/2015
>>> 1,350,1/30/2016
>>> 2,0,12/25/2012
>>> 2,450,9/15/2014
>>> 2,750,1/30/2016
>>> 2,  656, 11/30/2016
>>> ",sep=",",header=TRUE)
>>>
>>> output looks like as follows
>>>
>>> ID,flag,y1,y2,y1dif,y2dif
>>> 1,1,0,12/25/2014,0,0
>>> 1,2,125,9/15/2015,125,264
>>> 1,3,350,1/30/2016,225,137
>>> 2,1,0,12/25/2012,0,0
>>> 2,2,450,9/15/2014,450,629
>>> 2,3,750,1/30/2016,300,502
>>> 2, 4, 656 11/30/2016, -94, 305
>>>
>>> Thank you
>>>
>>
>> __
>> R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
>> https://stat.ethz.ch/mailman/listinfo/r-help
>> PLEASE do read the posting guide
>> http://www.R-project.org/posting-guide.html
>> and provide commented, minimal, self-contained, reproducible code.
>
>
> __
> R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Writing data onto xlsx file without cell formatting

2016-09-26 Thread jim holtman
I use the "openxlsx" package to handle spreadsheets.


Jim Holtman
Data Munger Guru

What is the problem that you are trying to solve?
Tell me what you want to do, not how you want to do it.

On Mon, Sep 26, 2016 at 5:56 PM, Christofer Bogaso <
bogaso.christo...@gmail.com> wrote:

> Hi again,
>
> I have been following above suggestion to export data from R to xlsx
> file using XLconnect. However recently I am facing Java memory
> allocation problem with large dataset (looks like a known issue with
> this package) and therefore decided to move to using "xlsx" package.
>
> Now I started facing that same problem of losing my existing formating
> when I use xlsx package for data export. Can someone help me with some
> pointer on how can I preserve the cell formating after exporting
> data.frame to some existing xlsx file using "xlsx" package.
>
> Thanks for your time.
>
> On Mon, Jul 11, 2016 at 10:43 AM, Ismail SEZEN <sezenism...@gmail.com>
> wrote:
> > I think, this is what you are looking for:
> >
> > http://stackoverflow.com/questions/11228942/write-from-
> r-into-template-in-excel-while-preserving-formatting
> >
> > On 11 Jul 2016, at 03:43, Christofer Bogaso <bogaso.christo...@gmail.com
> >
> > wrote:
> >
> > Hi again,
> >
> > I am trying to write a data frame to an existing Excel file (xlsx)
> > from row 5 and column 6 of the 1st Sheet. I was going through a
> > previous instruction which is available here :
> >
> > http://stackoverflow.com/questions/32632137/using-
> write-xlsx-in-r-how-to-write-in-a-specific-row-or-column-in-excel-file
> >
> > However trouble is that it is modifying/removing formatting of all the
> > affected cells. I have predefined formatting of those cells where data
> > to be pasted, and I dont want to modify or remove that formatting.
> >
> > Any idea if I need to pass some additional argument.
> >
> > Appreciate your valuable feedback.
> >
> > Thanks,
> >
> > __
> > R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
> > https://stat.ethz.ch/mailman/listinfo/r-help
> > PLEASE do read the posting guide http://www.R-project.org/
> posting-guide.html
> > and provide commented, minimal, self-contained, reproducible code.
> >
> >
>
> __
> R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/
> posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] how to remove all messages when loading a library ?

2016-09-25 Thread jim holtman
Try enclosing the whole thing in "suppressMessages";

suppressMessages({
if (!require("plyr",quietly=TRUE,warn.conflicts=FALSE))
suppressMessages(install.packages("plyr",quietly=TRUE))
if (!require("dplyr",quietly=TRUE))
suppressMessages(install.packages("dplyr",quietly=TRUE))
if (!require("stringr",quietly=TRUE))
suppressMessages(install.packages("stringr",quietly=TRUE))
if (!require("readr",quietly=TRUE))
suppressMessages(install.packages("readr",quietly=TRUE))
if (!require("tidyr",quietly=TRUE))
suppressMessages(install.packages("tidyr",quietly=TRUE))
if (!require("XML",quietly=TRUE))
suppressMessages(install.packages("XML",quietly=TRUE))
if (!require("Rcpp",quietly=TRUE))
suppressMessages(install.packages("Rcpp",quietly=TRUE))
if (!require("rbenchmark",quietly=TRUE))
suppressMessages(install.packages("rbenchmark",quietly=TRUE))
if (!require("tiff",quietly=TRUE))
suppressMessages(install.packages("tiff",quietly=TRUE))
if (!require("xlsx",quietly=TRUE))
suppressMessages(install.packages("xlsx",quietly=TRUE))
if (!require("ROracle",quietly=TRUE))
suppressMessages(install.packages("T:/CH/R/ROracle_1.2-2.zip", repos =
NULL, type = "source",quietly=TRUE))
})




Jim Holtman
Data Munger Guru

What is the problem that you are trying to solve?
Tell me what you want to do, not how you want to do it.

On Sun, Sep 25, 2016 at 3:22 PM, Fabien Tarrade <fabien.tarr...@gmail.com>
wrote:

> Hi there,
>
> I would like to remove all messages when I load library (I fact I am using
> "require" and "install.packages"). I tried many options and look at the
> documentation for the 2 functions.
> For example I am using the following piece of code: init.R
>
> print("step 1")
> # Install library if not installed
> if (!require("plyr",quietly=TRUE,warn.conflicts=FALSE))
> suppressMessages(install.packages("plyr",quietly=TRUE))
> if (!require("dplyr",quietly=TRUE)) suppressMessages(install.packa
> ges("dplyr",quietly=TRUE))
> if (!require("stringr",quietly=TRUE)) suppressMessages(install.packa
> ges("stringr",quietly=TRUE))
> if (!require("readr",quietly=TRUE)) suppressMessages(install.packa
> ges("readr",quietly=TRUE))
> if (!require("tidyr",quietly=TRUE)) suppressMessages(install.packa
> ges("tidyr",quietly=TRUE))
> if (!require("XML",quietly=TRUE)) suppressMessages(install.packa
> ges("XML",quietly=TRUE))
> if (!require("Rcpp",quietly=TRUE)) suppressMessages(install.packa
> ges("Rcpp",quietly=TRUE))
> if (!require("rbenchmark",quietly=TRUE)) suppressMessages(install.packa
> ges("rbenchmark",quietly=TRUE))
> if (!require("tiff",quietly=TRUE)) suppressMessages(install.packa
> ges("tiff",quietly=TRUE))
> if (!require("xlsx",quietly=TRUE)) suppressMessages(install.packa
> ges("xlsx",quietly=TRUE))
> if (!require("ROracle",quietly=TRUE)) suppressMessages(install.packa
> ges("T:/CH/R/ROracle_1.2-2.zip", repos = NULL, type =
> "source",quietly=TRUE))
> print("step 2")
>
> and I run the code in this way:
> > source("./init.R",encoding = "UTF-8",verbose=FALSE,echo=FAL
> SE,print.eval=FALSE)
>
> and I get the following output that I don't manage to remove:
>
> [1] "step 1"
>
> Attache Paket: ‘dplyr’
>
> Die folgenden Objekte sind maskiert von ‘package:plyr’:
>
> arrange, count, desc, failwith, id, mutate, rename, summarise,
> summarize
>
> Die folgenden Objekte sind maskiert von ‘package:stats’:
>
> filter, lag
>
> Die folgenden Objekte sind maskiert von ‘package:base’:
>
> intersect, setdiff, setequal, union
> [1] "step 2"
>
> What is the way to get no messages at all when loading library ? ( or
> using other R function in general).
> I am trying to keep only the important messages and so far I am getting
> way too much messages.
>
> Thanks
> Cheers
> Fabien
>
>
> --
> Dr Fabien Tarrade
>
> Quantitative Analyst/Developer - Data Scientist
>
> Senior data analyst specialised in the modelling, processing and
> statistical treatment of data.
> PhD in Physics, 10 years of experience as researcher at the forefront of
> international scientific research.
> Fascinated by finance and data modelling.
>
> Zurich, Switzerland
>
> Email : cont...@fabien-tarrade.eu <mailto:cont...@fabien-tarrade.eu&g

Re: [R] Accelerating binRead

2016-09-17 Thread jim holtman
Here is an example of how to do it:

x <- 1:10  # integer values
xf <- seq(1.0, 2, by = 0.1)  # floating point

setwd("d:/temp")

# create file to write to
output <- file('integer.bin', 'wb')
writeBin(x, output)  # write integer
writeBin(xf, output)  # write reals
close(output)


library(pack)
library(readr)

# read all the data at once
allbin <- read_file_raw('integer.bin')

# decode the data into a list
(result <- unpack("V V V V V V V V V V d d d d d d d d d d", allbin))




Jim Holtman
Data Munger Guru

What is the problem that you are trying to solve?
Tell me what you want to do, not how you want to do it.

On Sat, Sep 17, 2016 at 11:04 AM, Ismail SEZEN <sezenism...@gmail.com>
wrote:

> I noticed same issue but didnt care much :)
>
> On Sat, Sep 17, 2016, 18:01 jim holtman <jholt...@gmail.com> wrote:
>
>> Your example was not reproducible.  Also how do you "break" out of the
>> "while" loop?
>>
>>
>> Jim Holtman
>> Data Munger Guru
>>
>> What is the problem that you are trying to solve?
>> Tell me what you want to do, not how you want to do it.
>>
>> On Sat, Sep 17, 2016 at 8:05 AM, Philippe de Rochambeau <phi...@free.fr>
>> wrote:
>>
>> > Hello,
>> > the following function, which stores numeric values extracted from a
>> > binary file, into an R matrix, is very slow, especially when the said
>> file
>> > is several MB in size.
>> > Should I rewrite the function in inline C or in C/C++ using Rcpp? If the
>> > latter case is true, how do you « readBin »  in Rcpp (I’m a total Rcpp
>> > newbie)?
>> > Many thanks.
>> > Best regards,
>> > phiroc
>> >
>> >
>> > -
>> >
>> > # inputPath is something like http://myintranet/getData?
>> > pathToFile=/usr/lib/xxx/yyy/data.bin <http://myintranet/getData?
>> > pathToFile=/usr/lib/xxx/yyy/data.bin>
>> >
>> > PLTreader <- function(inputPath){
>> > URL <- file(inputPath, "rb")
>> > PLT <- matrix(nrow=0, ncol=6)
>> > compteurDePrints = 0
>> > compteurDeLignes <- 0
>> > maxiPrints = 5
>> > displayData <- FALSE
>> > while (TRUE) {
>> > periodIndex <- readBin(URL, integer(), size=4, n=1,
>> > endian="little") # int (4 bytes)
>> > eventId <- readBin(URL, integer(), size=4, n=1,
>> > endian="little") # int (4 bytes)
>> > dword1 <- readBin(URL, integer(), size=4, signed=FALSE,
>> > n=1, endian="little") # int
>> > dword2 <- readBin(URL, integer(), size=4, signed=FALSE,
>> > n=1, endian="little") # int
>> > if (dword1 < 0) {
>> > dword1 = dword1 + 2^32-1;
>> > }
>> > eventDate = (dword2*2^32 + dword1)/1000
>> > repNum <- readBin(URL, integer(), size=2, n=1,
>> > endian="little") # short (2 bytes)
>> > exp <- readBin(URL, numeric(), size=4, n=1,
>> > endian="little") # float (4 bytes, strangely enough, would expect 8)
>> > loss <- readBin(URL, numeric(), size=4, n=1,
>> > endian="little") # float (4 bytes)
>> > PLT <- rbind(PLT, c(periodIndex, eventId, eventDate,
>> > repNum, exp, loss))
>> > } # end while
>> > return(PLT)
>> > close(URL)
>> > }
>> >
>> > 
>> > [[alternative HTML version deleted]]
>> >
>> > __
>> > R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
>> > https://stat.ethz.ch/mailman/listinfo/r-help
>> > PLEASE do read the posting guide http://www.R-project.org/
>> > posting-guide.html
>> > and provide commented, minimal, self-contained, reproducible code.
>>
>> [[alternative HTML version deleted]]
>>
>> __
>> R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
>> https://stat.ethz.ch/mailman/listinfo/r-help
>> PLEASE do read the posting guide http://www.R-project.org/
>> posting-guide.html
>> and provide commented, minimal, self-contained, reproducible code.
>
>

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Accelerating binRead

2016-09-17 Thread jim holtman
I would also suggest that you take a look at the 'pack' package which can
convert the binary input to the value you want.  Part of your performance
problems might be all the short reads that you are doing.


Jim Holtman
Data Munger Guru

What is the problem that you are trying to solve?
Tell me what you want to do, not how you want to do it.

On Sat, Sep 17, 2016 at 11:04 AM, Ismail SEZEN <sezenism...@gmail.com>
wrote:

> I noticed same issue but didnt care much :)
>
> On Sat, Sep 17, 2016, 18:01 jim holtman <jholt...@gmail.com> wrote:
>
>> Your example was not reproducible.  Also how do you "break" out of the
>> "while" loop?
>>
>>
>> Jim Holtman
>> Data Munger Guru
>>
>> What is the problem that you are trying to solve?
>> Tell me what you want to do, not how you want to do it.
>>
>> On Sat, Sep 17, 2016 at 8:05 AM, Philippe de Rochambeau <phi...@free.fr>
>> wrote:
>>
>> > Hello,
>> > the following function, which stores numeric values extracted from a
>> > binary file, into an R matrix, is very slow, especially when the said
>> file
>> > is several MB in size.
>> > Should I rewrite the function in inline C or in C/C++ using Rcpp? If the
>> > latter case is true, how do you « readBin »  in Rcpp (I’m a total Rcpp
>> > newbie)?
>> > Many thanks.
>> > Best regards,
>> > phiroc
>> >
>> >
>> > -
>> >
>> > # inputPath is something like http://myintranet/getData?
>> > pathToFile=/usr/lib/xxx/yyy/data.bin <http://myintranet/getData?
>> > pathToFile=/usr/lib/xxx/yyy/data.bin>
>> >
>> > PLTreader <- function(inputPath){
>> > URL <- file(inputPath, "rb")
>> > PLT <- matrix(nrow=0, ncol=6)
>> > compteurDePrints = 0
>> > compteurDeLignes <- 0
>> > maxiPrints = 5
>> > displayData <- FALSE
>> > while (TRUE) {
>> > periodIndex <- readBin(URL, integer(), size=4, n=1,
>> > endian="little") # int (4 bytes)
>> > eventId <- readBin(URL, integer(), size=4, n=1,
>> > endian="little") # int (4 bytes)
>> > dword1 <- readBin(URL, integer(), size=4, signed=FALSE,
>> > n=1, endian="little") # int
>> > dword2 <- readBin(URL, integer(), size=4, signed=FALSE,
>> > n=1, endian="little") # int
>> > if (dword1 < 0) {
>> > dword1 = dword1 + 2^32-1;
>> > }
>> > eventDate = (dword2*2^32 + dword1)/1000
>> > repNum <- readBin(URL, integer(), size=2, n=1,
>> > endian="little") # short (2 bytes)
>> > exp <- readBin(URL, numeric(), size=4, n=1,
>> > endian="little") # float (4 bytes, strangely enough, would expect 8)
>> > loss <- readBin(URL, numeric(), size=4, n=1,
>> > endian="little") # float (4 bytes)
>> > PLT <- rbind(PLT, c(periodIndex, eventId, eventDate,
>> > repNum, exp, loss))
>> > } # end while
>> > return(PLT)
>> > close(URL)
>> > }
>> >
>> > 
>> > [[alternative HTML version deleted]]
>> >
>> > __
>> > R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
>> > https://stat.ethz.ch/mailman/listinfo/r-help
>> > PLEASE do read the posting guide http://www.R-project.org/
>> > posting-guide.html
>> > and provide commented, minimal, self-contained, reproducible code.
>>
>> [[alternative HTML version deleted]]
>>
>> __
>> R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
>> https://stat.ethz.ch/mailman/listinfo/r-help
>> PLEASE do read the posting guide http://www.R-project.org/
>> posting-guide.html
>> and provide commented, minimal, self-contained, reproducible code.
>
>

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Accelerating binRead

2016-09-17 Thread jim holtman
Your example was not reproducible.  Also how do you "break" out of the
"while" loop?


Jim Holtman
Data Munger Guru

What is the problem that you are trying to solve?
Tell me what you want to do, not how you want to do it.

On Sat, Sep 17, 2016 at 8:05 AM, Philippe de Rochambeau <phi...@free.fr>
wrote:

> Hello,
> the following function, which stores numeric values extracted from a
> binary file, into an R matrix, is very slow, especially when the said file
> is several MB in size.
> Should I rewrite the function in inline C or in C/C++ using Rcpp? If the
> latter case is true, how do you « readBin »  in Rcpp (I’m a total Rcpp
> newbie)?
> Many thanks.
> Best regards,
> phiroc
>
>
> -
>
> # inputPath is something like http://myintranet/getData?
> pathToFile=/usr/lib/xxx/yyy/data.bin <http://myintranet/getData?
> pathToFile=/usr/lib/xxx/yyy/data.bin>
>
> PLTreader <- function(inputPath){
> URL <- file(inputPath, "rb")
> PLT <- matrix(nrow=0, ncol=6)
> compteurDePrints = 0
> compteurDeLignes <- 0
> maxiPrints = 5
> displayData <- FALSE
> while (TRUE) {
> periodIndex <- readBin(URL, integer(), size=4, n=1,
> endian="little") # int (4 bytes)
> eventId <- readBin(URL, integer(), size=4, n=1,
> endian="little") # int (4 bytes)
> dword1 <- readBin(URL, integer(), size=4, signed=FALSE,
> n=1, endian="little") # int
> dword2 <- readBin(URL, integer(), size=4, signed=FALSE,
> n=1, endian="little") # int
> if (dword1 < 0) {
> dword1 = dword1 + 2^32-1;
> }
> eventDate = (dword2*2^32 + dword1)/1000
> repNum <- readBin(URL, integer(), size=2, n=1,
> endian="little") # short (2 bytes)
> exp <- readBin(URL, numeric(), size=4, n=1,
> endian="little") # float (4 bytes, strangely enough, would expect 8)
> loss <- readBin(URL, numeric(), size=4, n=1,
> endian="little") # float (4 bytes)
> PLT <- rbind(PLT, c(periodIndex, eventId, eventDate,
> repNum, exp, loss))
> } # end while
> return(PLT)
> close(URL)
> }
>
> 
> [[alternative HTML version deleted]]
>
> __
> R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/
> posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] gsub: replacing slashes in a string

2016-09-14 Thread jim holtman
try this:

> gsub("", "/", test)
[1] "8/24/2016" "8/24/2016" "6/16/2016" "6/16/2016"




Jim Holtman
Data Munger Guru

What is the problem that you are trying to solve?
Tell me what you want to do, not how you want to do it.

On Wed, Sep 14, 2016 at 12:25 PM, Joe Ceradini <joecerad...@gmail.com>
wrote:

> Hi all,
>
> There are many R help posts out there dealing with slashes in gsub. I
> understand slashes are "escape characters" and thus need to be treated
> differently, and display differently in R. However, I'm still stuck on
> find-replace problem, and would appreciate any tips. Thanks!
>
> GOAL: replace all "\\" with "/", so when export file to csv all slashes are
> the same.
>
> (test <- c("8/24/2016", "8/24/2016", "6/16/2016", "6\\16\\2016"))
>
> Lengths are all the same, I think (?) because of how R displays/deals with
> slashes. However, when I export this to a csv, e.g., there are still double
> slashes, which is a problem for me.
> nchar(test)
>
> Change direction of slashes - works.
> (test2 <- gsub("\\", "//", test, fixed = TRUE))
>
> Now lengths are now not the same
> nchar(test2)
>
> Change from double to single - does not work. Is this because it actually
> is a single slash but R is just displaying it as double? Regardless, when I
> export from R the double slashes do appear.
> gsub("", "//", test2, fixed = TRUE)
> gsub("", "//", test2)
> gsub("", "", test2, fixed = TRUE)
> gsub("", "", test2)
>
> [[alternative HTML version deleted]]
>
> __
> R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/
> posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] parsing the file

2016-08-28 Thread jim holtman
Here is an attempt at parsing the data.  It is fixed field so the regular
expression will extract the data.  Some does not seem to make sense since
it has curly brackets in the data.


Jim Holtman
Data Munger Guru

What is the problem that you are trying to solve?
Tell me what you want to do, not how you want to do it.

On Sun, Aug 28, 2016 at 8:49 AM, Glenn Schultz <glennmschu...@me.com> wrote:

> Hi Jim,
>
> Attached is the layout of the file I would like to parse with dput sample
> of the data.  From the layout it seems to me there are two sets in the data
> Header and Details.  I would like to either parse such that
>
>
>- I have either 1 comma delimited file of all data or
>- 2 comma delimited files one of header the other of details
>
>
> I have never seen a file layout described in the manner before.
> Consequently, I am a little confused as to how to work with the file.
>
> Best,
> Glenn
>
> "1176552 CL20031031367RBV319920901
>
>
>  217655208875{08875{08875{08875{08875{08875{22D22D22D22D22D2
> 2D13C13C13C13C13C13C604000{604000{604000{604
> 000{604000{604000{36{36{36{36{36{36{08500{08500{08500{08500{08500{
> 08500{1254240 CL20031031371KLV120020201
>
>
>  225424007484{07250{07375{07500{07625{08625{33F06H33H33I34{3
> 4A02A01I02{02{02A03B0001121957C123500{92{0001280
> 000{0001741000{0003849000{35I30{36{36{36{36{07000{07000{07000{07000{07000{
> 07000{1254253 CL20031031371KMA620020301
>
>
>  225425306715{06250{06500{06750{06875{07000{33C23G33C33I34{3
> 4A02{01I02{02{02A02C946646A35{85{0001030
> 000{0001205000{000130{35H30{36{36{36{36{06000{06000{06000{06000{06000{
> 06000{1259455 CL20031031371RE4420020501
>
>
>  225945507045{06750{06875{07000{07250{07375{34{28B34A34B34B3
> 4C01H01G01H01H01H02C93E36{765000{995
> 000{0001384000{0002184000{35I30{36{36{36{36{06500{06500{06500{06500{06500{
> 06500{1261060 CI20031031371S5V219940101
>
>
>  226106006637{06500{06500{06625{06750{06875{05B00C04H05I06B0
> 6B11H11G11G11H11H11I0001169090I65{95{0001250
> 000{0001328000{000190{18{18{18{18{18{18{06000{06000{06000{06000{06000{
> 06000{1335271 CI20031031375HMU519960101
>
>
>  233527107500{07500{07500{07500{07500{07500{08B06B08E08F08F0
> 8F09D09D09D09D09E09E717375{464000{55{770
> 000{0001085500{0001085500{18{18{18{18{18{18{07000{07000{07000{07000{07000{
> 07000{1440840 CL20031031380HV9519981101
>
>
>  244084006707{06500{06625{06750{06875{06875{27D03C28C29H30{3
> 0A06{05I06{06{06{06A615172I25{621000{
> 673000{75{791000{36{36{36{36{36{36{06000{
> 06000{06000{06000{06000{06000{1521993 CI20031031384E3A62101
>
>
>252199306937{06875{06875{06875{07000{07000{12H02H12H13{13D1
> 3E04E04E04E04E04F04F0001129428F70{955000{0001000
> 000{0002087000{0002087000{18{18{18{18{18{18{06500{06500{0650
> 0{06500{06500{06500{1538080 CL20031031384YXH42501
>
>
>  253808008875{08875{08875{08875{08875{08875{31I31I31I31I31I3
> 1I04A04A04A04A04A04A0001419300{0001419300{0001419300{0001419
> 300{0001419300{0001419300{36{36{36{36{36{36{07000{07000{07000{07000{07000{
> 07000{1659123 CI20031031390XG8720020801
>
>
>  265912306909{06750{06750{06875{07000{07125{16E15I16C16E16F1
> 6F01E01D01D01E01E01G998541G162000{792000{0001156
> 500{000160{000199{18{18{18{18{18{18{06000{06000{06000{06000{06000{
> 06000{"
>
require(stringr)

# input data
data2 <- c("1176552 CL20031031367RBV319920901 
217655208875{08875{08875{08875{08875{08875{22D22D22D22D22D22D13C13C13C13C13C13C604000{604000{604000{604000{604000{604000{36{36{36{36{36{36{08500{08500{08500{08500{08500{08500{",
 "1254240 CL20031031371KLV120020201 
225424007484{07250{07375{07500{07625{08625{33F06H33H33I34{34A02A01I02{02{02A03B0001121957C123500{92{000128{0001741000{0003849000{35I30{36{36{36{36{07000{07000{07000{07000{07000{07000{",
 "1254253 CL20031031371KMA620020301 
225425306715{06250{06500{06750{06875{07000{33C23G33C33I34{34A02{01I02{02{02A02C946646A35{85{000103{0001205000{000130{35H30{36{36{36{36{06000{06000{06000{06000{06000{06000{",
 "1259455 CL20031031371RE4420020501 
225945507045{06750{06875{07000{07250{07375{34{28B34A34B34B34C01H01G01H01H01H02C93E36{765000{995000{0001384000{0002184000{35I30{36{36{36{36{06500{06500{06500{06500{06500{06500{",
 "1261060 CI20031031371S5V219940101 
226106006637{06500{06500{06625{06750{06875{05B00C04H05I06B06B11H11G11G11H11H11I0001169090I65{95{000125{0001328000{000190{18{18{18{18{18{18{06000{06000{06000{06000{06000{06000{",
 "1335271 CI20031031375HMU51

Re: [R] parsing a complex file

2016-08-27 Thread jim holtman
It is not clear as to how you want to parse the file.  You need to at least
provide an example of what you expect from the output.  You mention " the
detail which begins with 2 at byte location 1 to another file"; I don't see
the '2' at byte location 1.


Jim Holtman
Data Munger Guru

What is the problem that you are trying to solve?
Tell me what you want to do, not how you want to do it.

On Sat, Aug 27, 2016 at 4:56 PM, Glenn Schultz <glennmschu...@me.com> wrote:

> All,
>
> I have a complex file I would like to parse in R a sample is described
> below
>
> The header is 1:200 and the detail is 1 to 200.  I have written code to
> parse the file so far.  As follows:
>
> numchar <- nchar(x = data, type = "chars")
> start <- c(seq(1, numchar, 398))
> end <- c(seq(398, numchar, 398))
> quartile <- NULL
> final <- str_sub(data, start[1:length(start)], end[1:length(end)])
> quartile <- append(quartile, final)
> write(quartile, Result)
> data2 <- readLines(Result)
>
> The function gets me to data2.  All is well so far. However, I need to
> send the header which begins with 1 at byte location 1 to a file and the
> detail which begins with 2 at byte location 1 to another file.  When I look
> at data2 in RStudio  I see the following.  The file is 185 meg, I have the
> lines but I am stuck as to the next step.  Any ideas are appreciated.
>
> Glenn
>
>
> dput of the data
>
> "1176552 CL20031031367RBV319920901
>
>
>  217655208875{08875{08875{08875{08875{08875{22D22D22D22D22D2
> 2D13C13C13C13C13C13C604000{604000{604000{
> 604000{604000{604000{36{36{36{36{36{36{
> 08500{08500{08500{08500{08500{08500{1254240 CL20031031371KLV120020201
>
>
>  225424007484{07250{07375{07500{07625{08625{33F06H33H33I34{
> 34A02A01I02{02{02A03B0001121957C123500{92{
> 000128{0001741000{0003849000{35I30{36{36{36{36{
> 07000{07000{07000{07000{07000{07000{1254253 CL20031031371KMA620020301
>
>
>  225425306715{06250{06500{06750{06875{07000{33C23G33C33I34{
> 34A02{01I02{02{02A02C946646A35{85{
> 000103{0001205000{000130{35H30{36{36{36{36{
> 06000{06000{06000{06000{06000{06000{1259455 CL20031031371RE4420020501
>
>
>  225945507045{06750{06875{07000{07250{07375{34{28B34A34B34B3
> 4C01H01G01H01H01H02C93E36{765000{
> 995000{0001384000{0002184000{35I30{36{36{36{36{
> 06500{06500{06500{06500{06500{06500{1261060 CI20031031371S5V219940101
>
>
>  226106006637{06500{06500{06625{06750{06875{05B00C04H05I06B0
> 6B11H11G11G11H11H11I0001169090I65{95{
> 000125{0001328000{000190{18{18{18{18{18{18{
> 06000{06000{06000{06000{06000{06000{1335271 CI20031031375HMU519960101
>
>
>  233527107500{07500{07500{07500{07500{07500{08B06B08E08F08F0
> 8F09D09D09D09D09E09E717375{464000{55{
> 77{0001085500{0001085500{18{18{18{18{18{18{
> 07000{07000{07000{07000{07000{07000{1440840 CL20031031380HV9519981101
>
>
>  244084006707{06500{06625{06750{06875{06875{27D03C28C29H30{
> 30A06{05I06{06{06{06A615172I25{6
> 21000{673000{75{791000{36{36{36{36{36{36{
> 06000{06000{06000{06000{06000{06000{1521993 CI20031031384E3A62101
>
>
>  252199306937{06875{06875{06875{07000{07000{12H02H12H13{13D1
> 3E04E04E04E04E04F04F0001129428F70{955000{0001000
> 000{0002087000{0002087000{18{18{18{18{18{18{06500{06500{
> 06500{06500{06500{06500{1538080 CL20031031384YXH42501
>
>
>  253808008875{08875{08875{08875{08875{08875{31I31I31I31I31I3
> 1I04A04A04A04A04A04A0001419300{0001419300{0001419300{
> 0001419300{0001419300{0001419300{36{36{36{36{36{36{
> 07000{07000{07000{07000{07000{07000{1659123 CI20031031390XG8720020801
>
>
>  265912306909{06750{06750{06875{07000{07125{16E15I16C16E16F1
> 6F01E01D01D01E01E01G998541G162000{792000{
> 0001156500{000160{000199{18{18{18{18{18{18{
> 06000{06000{06000{06000{06000{06000{"
>
>
> dput data2
> c("1176552 CL20031031367RBV319920901 217655208875{08875{08875{08875
> {08875{08875{22D22D22D22D22D22D13C13C13C13C13C13C604000{
> 604000{604000{604000{604000{604000{36{36{36{36{36{36{
> 08500{08500{08500{08500{08500{08500{", "1254240 CL20031031371KLV120020201
> 225424007484{07250{07375{07500{07625{08625{33F06H33H33I34{
> 34A02A01I02{02{02A03B0001121957C123500{92{
> 000128{0001741000{0003849000{35I30{36{36{36{36{
> 07000{07000{07000{07000{07000{07000{", "1254253 CL20031031371KMA620020301
> 225425306715{06250{06500{06750{06875{07000{33C23G33C33I34{
> 34A02{01I02{02{02A02C946646A35{

Re: [R] read.xlsx function crashing R Studio

2016-08-22 Thread jim holtman
try the openxlsx package


Jim Holtman
Data Munger Guru

What is the problem that you are trying to solve?
Tell me what you want to do, not how you want to do it.

On Sun, Aug 21, 2016 at 1:30 PM, Kevin Kowitski <k.kowit...@icloud.com>
wrote:

> Hey everyone,
>
>I have used read.xlsx in the past rather than XLConnect for importing
> Excel data to R.  However, I have been finding now that the read.xlsx
> function has been causing my R studio to Time out.  I thought it might be
> because the R studio I had was out of date so I installed R studio X64
> 3.3.1 and reinstalled the xlsx package but it is still failing.  I have
> been trying to use XLConnect in it's place which has been working, excpet
> that I am running into memory error:
>   Error: OutOfMemoryError (Java): GC overhead limit exceeded
>
> I did some online searching and found an option to increase memory:
>   "options(java.parameters = "-Xmx4g" )
>
> but it resulted in this new memory Error:
>
>  Error: OutOfMemoryError (Java): Java heap space
>
> Can anyone provide me with some help on getting the read.xlsx function
> working?
>
> -Kevin
>
> __
> R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posti
> ng-guide.html
> and provide commented, minimal, self-contained, reproducible code.

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Conditionally remove rows with logic

2016-08-10 Thread jim holtman
try this:

> input <- read.table(text = "ID TIME LABEL
+  100
+  130
+  160
+  190
+  112  1
+  115  0
+  118   0
+  200
+  230
+  261
+  290
+  212  0
+  215  0
+  218  0", header = TRUE)
>
>  result <- do.call(rbind,
+ lapply(split(input, input$ID), function(.id){
+ indx <- which(.id$LABEL == 1)
+ if (length(indx) == 1) .id <- .id[1:indx, ]  # keep upto the '1'
+ .id
+ })
+ )
>
>
> result
 ID TIME LABEL
1.1   10 0
1.2   13 0
1.3   16 0
1.4   19 0
1.5   1   12 1
2.8   20 0
2.9   23 0
2.10  26 1
>


Jim Holtman
Data Munger Guru

What is the problem that you are trying to solve?
Tell me what you want to do, not how you want to do it.

On Sun, Aug 7, 2016 at 6:21 PM, Jennifer Sheng <jennifer.sheng2...@gmail.com
> wrote:

> Dear all,
>
> I need to remove any rows AFTER the label becomes 1.  For example, for ID
> 1, the two rows with TIME of 15 & 18 should be removed; for ID 2, any rows
> after time 6, i.e., rows of time 9-18, should be removed.  Any
> suggestions?  Thank you very much!
>
> The current dataset looks like the following:
> ID TIME LABEL
> 100
> 130
> 160
> 190
> 112  1
> 115  0
> 118   0
> 200
> 230
> 261
> 290
> 212  0
> 215  0
> 218  0
>
> Thanks a lot!
> Jennifer
>
> [[alternative HTML version deleted]]
>
> __
> R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/
> posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Strange message after reading multiple scripts from one folder

2016-07-29 Thread jim holtman
Hard to tell without seeing the scripts.  Do you have a matrix in your
scripts that have "value" and "visible" as row names?  You probably have
some statement that is causing output and so the problem is "your" as to
how to avoid the message.  So look at your scripts to see if anything
refers to either "value" or "visible", and then you might find the cause of
your problem.


Jim Holtman
Data Munger Guru

What is the problem that you are trying to solve?
Tell me what you want to do, not how you want to do it.

On Fri, Jul 29, 2016 at 6:52 AM, Frank S. <f_j_...@hotmail.com> wrote:

> Dear list,
>
> I have one folder named "scripts_JMbayes", wich contains 10 R scripts.
> I can read them properly by doing:
>
> > pathnames <- list.files(pattern="[.]R", path="Mydir/scripts_JMbayes",
> full.names = TRUE)
> > sapply(pathnames, USE.NAMES = FALSE, FUN = source,)
>
> However, R generates the following message:
>
> [,1]  [,2]  [,3]  [,4]  [,5]  [,6]  [,7]  [,8]  [,9]  [,10]
> value   ? ? ? ? ? ? ? ? ? ?
> visible FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
>
> What does it mean and what should I change to avoid this message?
> Any help would be appreciated!
>
> Best,
>
> Frank
>
>
> [[alternative HTML version deleted]]
>
> __
> R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide
> http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Subtraction with aggregate

2016-07-28 Thread jim holtman
One thing to watch out for are there always two samples (one of each type)
for each subject?  You had better sort by the emotion to make sure that
when you do the difference, it is always with the data in the same
order.Here is an example of some of these cases where they are ignored:


> library(dplyr)
> mydata <- read.table(text = "subject   QMemotion yi
+  s0  123  neutral   321  # only one sample
+  s5  123 neutral 321   # three samples
+  s5  321 negative  345
+  s5  345 what  1234
+  s6 456 neutral 567   # two emotions the same
+  s6 567 neutral 123
+s1   75.1017   neutral  -75.928276
+s2  -47.3512   neutral -178.295990
+s3  -68.9016   neutral -134.753906
+s1   17.2099  negative -104.168312
+s2  -53.1114  negative -182.373474
+s3  -33.0322  negative -137.420410", header = TRUE, as.is = TRUE)
>
> agg <- mydata %>%
+ arrange(desc(emotion)) %>%  # sort
+ group_by(subject) %>%
+ filter(n() == 2 && emotion[1L] != emotion[2L]) %>%  # test for 2
emotions that are different
+ summarise(QM = QM[1L] - QM[2L],
+   yi = yi[1L] - yi[2L]
+   )
>
>
> agg
# A tibble: 3 x 3
  subject   QMyi
 
1  s1  57.8918 28.240036
2      s2   5.7602  4.077484
3  s3 -35.8694  2.666504


Jim Holtman
Data Munger Guru

What is the problem that you are trying to solve?
Tell me what you want to do, not how you want to do it.

On Thu, Jul 28, 2016 at 5:21 PM, Gang Chen <gangch...@gmail.com> wrote:

> Hi Jim and Jeff,
>
> Thanks for the quick help!
>
> Sorry I didn't state the question clearly: I want the difference
> between 'neutral' and 'negative' for each subject. And another person
> offered a solution for it:
>
> aggregate(cbind(QM, yi) ~ subject, data = mydata, FUN = diff)
>
>
> On Thu, Jul 28, 2016 at 4:53 PM, jim holtman <jholt...@gmail.com> wrote:
> > Not sure what you mean by "nice way", but here is a dplyr solution:
> >
> >> library(dplyr)
> >> mydata <- read.table(text = "subject   QMemotion yi
> > +s1   75.1017   neutral  -75.928276
> > +s2  -47.3512   neutral -178.295990
> > +s3  -68.9016   neutral -134.753906
> > +s1   17.2099  negative -104.168312
> > +s2  -53.1114  negative -182.373474
> > +s3  -33.0322  negative -137.420410", header = TRUE)
> >> agg <- mydata %>%
> > + group_by(subject) %>%
> > + summarise(QM = mean(QM),
> > +   yi = mean(yi)
> > +   )
> >>
> >>
> >> agg
> > # A tibble: 3 x 3
> >   subject   QM yi
> >  
> > 1  s1  46.1558  -90.04829
> > 2  s2 -50.2313 -180.33473
> > 3  s3 -50.9669 -136.08716
> >
> >
> >
> > Jim Holtman
> > Data Munger Guru
> >
> > What is the problem that you are trying to solve?
> > Tell me what you want to do, not how you want to do it.
> >
> > On Thu, Jul 28, 2016 at 4:40 PM, Gang Chen <gangch...@gmail.com> wrote:
> >>
> >> With the following data in data.frame:
> >>
> >> subject   QMemotion yi
> >>   s1   75.1017   neutral  -75.928276
> >>   s2  -47.3512   neutral -178.295990
> >>   s3  -68.9016   neutral -134.753906
> >>   s1   17.2099  negative -104.168312
> >>   s2  -53.1114  negative -182.373474
> >>   s3  -33.0322  negative -137.420410
> >>
> >> I can obtain the average between the two emotions with
> >>
> >> mydata <- read.table('clipboard', header=TRUE)
> >> aggregate(mydata[,c('yi', 'QM')], by=list(subject=mydata$subject), mean)
> >>
> >> My question is, what is a nice way to get the difference between the
> >> two emotions?
> >>
> >> Thanks,
> >> Gang
> >>
> >> __
> >> R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
> >> https://stat.ethz.ch/mailman/listinfo/r-help
> >> PLEASE do read the posting guide
> >> http://www.R-project.org/posting-guide.html
> >> and provide commented, minimal, self-contained, reproducible code.
> >
> >
>

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Subtraction with aggregate

2016-07-28 Thread jim holtman
Not sure what you mean by "nice way", but here is a dplyr solution:

> library(dplyr)
> mydata <- read.table(text = "subject   QMemotion yi
+s1   75.1017   neutral  -75.928276
+s2  -47.3512   neutral -178.295990
+s3  -68.9016   neutral -134.753906
+s1   17.2099  negative -104.168312
+s2  -53.1114  negative -182.373474
+s3  -33.0322  negative -137.420410", header = TRUE)
> agg <- mydata %>%
+ group_by(subject) %>%
+ summarise(QM = mean(QM),
+   yi = mean(yi)
+   )
>
>
> agg
# A tibble: 3 x 3
  subject   QM yi
 
1  s1  46.1558  -90.04829
2  s2 -50.2313 -180.33473
3  s3 -50.9669 -136.08716



Jim Holtman
Data Munger Guru

What is the problem that you are trying to solve?
Tell me what you want to do, not how you want to do it.

On Thu, Jul 28, 2016 at 4:40 PM, Gang Chen <gangch...@gmail.com> wrote:

> With the following data in data.frame:
>
> subject   QMemotion yi
>   s1   75.1017   neutral  -75.928276
>   s2  -47.3512   neutral -178.295990
>   s3  -68.9016   neutral -134.753906
>   s1   17.2099  negative -104.168312
>   s2  -53.1114  negative -182.373474
>   s3  -33.0322  negative -137.420410
>
> I can obtain the average between the two emotions with
>
> mydata <- read.table('clipboard', header=TRUE)
> aggregate(mydata[,c('yi', 'QM')], by=list(subject=mydata$subject), mean)
>
> My question is, what is a nice way to get the difference between the
> two emotions?
>
> Thanks,
> Gang
>
> __
> R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide
> http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] about file name

2016-07-28 Thread jim holtman
add another step: (need to learn about regular expressions)

> a
[1] "X35.84375_.100.71875"
> a.new <- sub("^.", '', a)
> a.new
[1] "35.84375_.100.71875"
> sub("_.", "_-", a.new)
[1] "35.84375_-100.71875"
>


Jim Holtman
Data Munger Guru

What is the problem that you are trying to solve?
Tell me what you want to do, not how you want to do it.

On Thu, Jul 28, 2016 at 4:39 PM, lily li <chocol...@gmail.com> wrote:

> Thanks, but how to get the string like this:
> "35.84375_-100.71875" use the minus sign instead of dot.
>
> On Thu, Jul 28, 2016 at 2:38 PM, jim holtman <jholt...@gmail.com> wrote:
>
>> just strip off the first character:
>>
>> > a
>> [1] "X35.84375_.100.71875"
>> > a.new <- sub("^.", '', a)
>> > a.new
>> [1] "35.84375_.100.71875"
>> >
>>
>>
>> Jim Holtman
>> Data Munger Guru
>>
>> What is the problem that you are trying to solve?
>> Tell me what you want to do, not how you want to do it.
>>
>> On Thu, Jul 28, 2016 at 3:51 PM, lily li <chocol...@gmail.com> wrote:
>>
>>> Hi R users,
>>>
>>> I have a string for example 'X35.84375_.100.71875', and I have another
>>> dataframe df that I want to export with the transformed string name
>>> '35.84375_-100.71875' with no extension. How to do this in R? Thanks for
>>> your help.
>>>
>>> a = 'X35.84375_.100.71875'
>>> write.table(df, file='', row.names=F, col.names=F)
>>>
>>> [[alternative HTML version deleted]]
>>>
>>> __
>>> R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
>>> https://stat.ethz.ch/mailman/listinfo/r-help
>>> PLEASE do read the posting guide
>>> http://www.R-project.org/posting-guide.html
>>> and provide commented, minimal, self-contained, reproducible code.
>>>
>>
>>
>

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] about file name

2016-07-28 Thread jim holtman
just strip off the first character:

> a
[1] "X35.84375_.100.71875"
> a.new <- sub("^.", '', a)
> a.new
[1] "35.84375_.100.71875"
>


Jim Holtman
Data Munger Guru

What is the problem that you are trying to solve?
Tell me what you want to do, not how you want to do it.

On Thu, Jul 28, 2016 at 3:51 PM, lily li <chocol...@gmail.com> wrote:

> Hi R users,
>
> I have a string for example 'X35.84375_.100.71875', and I have another
> dataframe df that I want to export with the transformed string name
> '35.84375_-100.71875' with no extension. How to do this in R? Thanks for
> your help.
>
> a = 'X35.84375_.100.71875'
> write.table(df, file='', row.names=F, col.names=F)
>
> [[alternative HTML version deleted]]
>
> __
> R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide
> http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] means by year, month and day

2016-07-17 Thread jim holtman
Here is an example of using dplyr.  Please provide a reasonable subset of
data.  Your was all for the same date.  Use 'dput' to put in your email.

> x <- read.table(text = " X.YY MM DD hh WDI R.WSP D.GST   PRES  ATMP
DEWP
+   2015  1  1  0 328   3.6   4.5 1028.0   3.8  -3.5
+   2015  1  1  1 300   2.1   2.7 1027.9   3.7  -4.4
+   2015  1  1  2 264   2.4   2.9 1027.7   3.6  -4.5
+   2015  1  1  3 230   4.1   4.5 1027.4   4.2  -3.8
+   2015  1  1  4 242   8.1   9.2 1026.6   4.4  -3.1
+   2015  1  1  5 262   9.3  10.1 1026.6   4.1  -3.8
+   2015  1  1  6 267   8.6   9.6 1026.3   4.2  -3.8
+   2015  1  1  7 264   9.3   9.9 1026.1   3.9  -2.8
+   2015  1  1  8 268   8.2   9.1 1026.1   3.5  -3.0
+   2015  1  1  9 272   8.8   9.6 1025.4   3.2  -3.3
+   2015  2  1  0 328   3.6   4.5 1028.0   3.8  -3.5
+   2015  2  1  1 300   2.1   2.7 1027.9   3.7  -4.4
+   2015  2  1  2 264   2.4   2.9 1027.7   3.6  -4.5
+   2015  2  1  3 230   4.1   4.5 1027.4   4.2  -3.8
+   2015  2  1  4 242   8.1   9.2 1026.6   4.4  -3.1
+   2015  2  1  5 262   9.3  10.1 1026.6   4.1  -3.8
+   2015  2  1  6 267   8.6   9.6 1026.3   4.2  -3.8
+   2015  2  1  7 264   9.3   9.9 1026.1   3.9  -2.8
+   2015  2  1  8 268   8.2   9.1 1026.1   3.5  -3.0
+   2015  2  1  9 272   8.8   9.6 1025.4   3.2  -3.3
+   2015  3  1  0 328   3.6   4.5 1028.0   3.8  -3.5
+   2015  3  1  1 300   2.1   2.7 1027.9   3.7  -4.4
+   2015  3  1  2 264   2.4   2.9 1027.7   3.6  -4.5
+   2015  3  1  3 230   4.1   4.5 1027.4   4.2  -3.8
+   2015  3  1  4 242   8.1   9.2 1026.6   4.4  -3.1
+   2015  3  1  5 262   9.3  10.1 1026.6   4.1  -3.8
+   2015  3  1  6 267   8.6   9.6 1026.3   4.2  -3.8
+   2015  3  1  7 264   9.3   9.9 1026.1   3.9  -2.8
+   2015  3  1  8 268   8.2   9.1 1026.1   3.5  -3.0
+   2015  3  1  9 272   8.8   9.6 1025.4   3.2  -3.3
+  ",
+  header = TRUE,
+  as.is = TRUE)
>
>  library(dplyr)
>  by_year <- x %>%
+ group_by(X.YY) %>%
+ summarise_each(funs(mean))
>
>  by_ym <- x %>%
+ group_by(X.YY, MM ) %>%
+ summarise_each(funs(mean))
>
>  by_ymd <- x %>%
+ group_by(X.YY, MM, DD) %>%
+ summarise_each(funs(mean))
>
> by_year
Source: local data frame [1 x 10]
   X.YYMMDDhh   WDI R.WSP D.GSTPRES  ATMP  DEWP
 
1  2015 2 1   4.5 269.7  6.45  7.21 1026.81  3.86  -3.6
> by_ym
Source: local data frame [3 x 10]
Groups: X.YY [?]
   X.YYMMDDhh   WDI R.WSP D.GSTPRES  ATMP  DEWP
 
1  2015 1 1   4.5 269.7  6.45  7.21 1026.81  3.86  -3.6
2  2015 2 1   4.5 269.7  6.45  7.21 1026.81  3.86  -3.6
3  2015 3 1   4.5 269.7  6.45  7.21 1026.81  3.86  -3.6
> by_ymd
Source: local data frame [3 x 10]
Groups: X.YY, MM [?]
   X.YYMMDDhh   WDI R.WSP D.GSTPRES  ATMP  DEWP
 
1  2015 1 1   4.5 269.7  6.45  7.21 1026.81  3.86  -3.6
2  2015 2 1   4.5 269.7  6.45  7.21 1026.81  3.86  -3.6
3  2015 3 1   4.5 269.7  6.45  7.21 1026.81  3.86  -3.6
>


Jim Holtman
Data Munger Guru

What is the problem that you are trying to solve?
Tell me what you want to do, not how you want to do it.

On Sun, Jul 17, 2016 at 5:42 PM, Jianling Fan <fanjianl...@gmail.com> wrote:

> Hello Tom,
>
> try aggregate() or cast(). Both works.I prefer the latter.
>
>
> library(reshape)
> desc<-melt(mydata, measure.vars=c("WDI","R.WSP", "D.GST", "PRES",
> "ATMP", "DEWP"),
>id.vars=c("X.YY","MM","DD"))
> summary<-cast(desc, X.YY+MM+DD~variable, mean)
>
>
>
>
>
>
>
>
>
> On 17 July 2016 at 06:22, Tom Mosca <t...@vims.edu> wrote:
> > Hello Good Folk,
> >
> > My dataframe looks like this:
> >> mydata
> >  X.YY MM DD hh WDI R.WSP D.GST   PRES  ATMP  DEWP
> > 12015  1  1  0 328   3.6   4.5 1028.0   3.8  -3.5
> > 22015  1  1  1 300   2.1   2.7 1027.9   3.7  -4.4
> > 32015  1  1  2 264   2.4   2.9 1027.7   3.6  -4.5
> > 42015  1  1  3 230   4.1   4.5 1027.4   4.2  -3.8
> > 52015  1  1  4 242   8.1   9.2 1026.6   4.4  -3.1
> > 62015  1  1  5 262   9.3  10.1 1026.6   4.1  -3.8
> > 72015  1  1  6 267   8.6   9.6 1026.3   4.2  -3.8
> > 82015  1  1  7 264   9.3   9.9 1026.1   3.9  -2.8
> > 92015  1  1  8 268   8.2   9.1 1026.1   3.5  -3.0
> > 10   2015  1  1  9 272   8.8   9.6 1025.4   3.2  -3.3 …
> >
> > The first four columns are year, month, day, hour (0 – 23).  I wish to
> take the means of the next six columns (WDIR, WSPD, GST, PRES, ATMP and
> DEWP) by year, month and day.  That is, I want daily averages.
> >
> > Please help.  Thank you.
> >
> > Tom
> >
> > [[alte

Re: [R] Difficulty subsetting data frames using logical operators

2016-07-01 Thread jim holtman
You may need to re-read the Intro to R.

data[data$Ozone > 31,]

or

subset(data, Ozone > 31)


Jim Holtman
Data Munger Guru

What is the problem that you are trying to solve?
Tell me what you want to do, not how you want to do it.

On Fri, Jul 1, 2016 at 5:11 AM, Giles Bischoff <g...@st-andrews.ac.uk>
wrote:

> So, I uploaded a data set via my directory using the command data <-
> data.frame(read.csv("hw1_data.csv")) and then tried to subset that data
> using logical operators. Specifically, I was trying to make it so that I
> got all the rows in which the values for "Ozone" (a column in the data set)
> were greater than 31 (I was trying to get the mean of all said values).
> Then, I tried using the command data[ , "Ozone">31]. Additionally, I had
> trouble getting it so that I had all the rows where all the values in
> "Ozone">31 & "Temp">90 simultaneously. There were some NA values in both of
> those columns, so that might be it. If someone could help me to figure out
> how to remove those values, that'd be great as well. I'm using a Mac (OS X)
> with the latest version of R (3.1.2. I think??).
>
> Here is some of the code I used:
>
> >data <- data.frame(read.csv("hw1_data.csv"))
> > data
> Ozone Solar.R Wind Temp Month Day
> 1  41 190  7.4   67 5   1
> 2  36 118  8.0   72 5   2
> 3  12 149 12.6   74 5   3
> 4  18 313 11.5   62 5   4
> 5  NA  NA 14.3   56 5   5
> 6  28  NA 14.9   66 5   6
> 7  23 299  8.6   65 5   7
> 8  19  99 13.8   59 5   8
> 9   8  19 20.1   61 5   9
> 10 NA 194  8.6   69 5  10
> 11  7  NA  6.9   74 5  11
> 12 16 256  9.7   69 5  12
> 13 11 290  9.2   66 5  13
> 14 14 274 10.9   68 5  14
> 15 18  65 13.2   58 5  15
> 16 14 334 11.5   64 5  16
> 17 34 307 12.0   66 5  17
> 18  6  78 18.4   57 5  18
> 19 30 322 11.5   68 5  19
> 20 11  44  9.7   62 5  20
> 21  1   8  9.7   59 5  21
> 22 11 320 16.6   73 5  22
> 23  4  25  9.7   61 5  23
> 24 32  92 12.0   61 5  24
> 25 NA  66 16.6   57 5  25
> 26 NA 266 14.9   58 5  26
> 27 NA  NA  8.0   57 5  27
> 28 23  13 12.0   67 5  28
> 29 45 252 14.9   81 5  29
> 30115 223  5.7   79 5  30
> 31 37 279  7.4   76 5  31
> 32 NA 286  8.6   78 6   1
> 33 NA 287  9.7   74 6   2
> 34 NA 242 16.1   67 6   3
> 35 NA 186  9.2   84 6   4
> 36 NA 220  8.6   85 6   5
> 37 NA 264 14.3   79 6   6
> 38 29 127  9.7   82 6   7
> 39 NA 273  6.9   87 6   8
> 40 71 291 13.8   90 6   9
> 41 39 323 11.5   87 6  10
> 42 NA 259 10.9   93 6  11
> 43 NA 250  9.2   92 6  12
> 44 23 148  8.0   82 6  13
> 45 NA 332 13.8   80 6  14
> 46 NA 322 11.5   79 6  15
> 47 21 191 14.9   77 6  16
> 48 37 284 20.7   72 6  17
> 49 20  37  9.2   65 6  18
> 50 12 120 11.5   73 6  19
> 51 13 137 10.3   76 6  20
> 52 NA 150  6.3   77 6  21
> 53 NA  59  1.7   76 6  22
> 54 NA  91  4.6   76 6  23
> 55 NA 250  6.3   76 6  24
> 56 NA 135  8.0   75 6  25
> 57 NA 127  8.0   78 6  26
> 58 NA  47 10.3   73 6  27
> 59 NA  98 11.5   80 6  28
> 60 NA  31 14.9   77 6  29
> 61 NA 138  8.0   83 6  30
> 62135 269  4.1   84 7   1
> 63 49 248  9.2   85 7   2
> 64 32 236  9.2   81 7   3
> 65 NA 101 10.9   84 7   4
> 66 64 175  4.6   83 7   5
> 67 40 314 10.9   83 7   6
> 68 77 276  5.1   88 7   7
> 69 97 267  6.3   92 7   8
> 70 97 272  5.7   92 7   9
> 71 85 175  7.4   89 7  10
> 72 NA 139  8.6   82 7  11
> 73 10 264 14.3   73 7  12
> 74 27 175 14.9   81 7  13
> 75 NA 291 14.9   91 7  14
> 76  7  48 14.3   80 7  15
> 77 48 260  6.9   81 7  16
> 78 35 274 10.3   82 7  17
> 79 61 285  6.3   84 7  18
> 80 79 187  5.1   87 7  19
> 81 63 220 11.5   85 7  20
> 82 16   7  6.9   74 7  21
> 83 NA 258  9.7   81 7  22
> 84 NA 295 11.5

  1   2   3   4   5   6   7   8   9   10   >