Hi You should send your responses to R helplist, others could offer better/different solutions.
I myself am not an expert for regex so if all your files are formated in the same way I would use strsplit. # I read header to test object test<-readLines("clipboard") str(test) chr [1:4] "PATIENT NAME: CONFIDENTIAL,#12345" "PATIENT ID #: 12345" ... # here is something similar to your csv file test2<-read.table("clipboard") test2 Id1 Id2 VisitDate 1 12345 12345 4/3/2018 2 11111 11111 5/4/2018 # here I split second line of patient record, select 4th item and compare with Id2 value from csv file. sel<-which(test2$Id2 == as.numeric(unlist(strsplit(test[2], " "))[4])) # I take third line of patient record and split it out<-unlist(strsplit(test[3], split=" ")) # and change 4th item with selected value from csv VisitDate out[4] <- as.character(test2$VisitDate[sel]) # here you should be aware of difference between factors and characters # and finally make collapsed line, which could be used to change third line in patient record paste(out, collapse=" ") [1] "DATE OF SERVICE: 4/3/2018" But what you want to do with it? It actually manipulates objects in your R session and not original files. I believe that there are other tools more suitable for such tasks. Cheers Petr > -----Original Message----- > From: Nicola Cecchino <ncecch...@gmail.com> > Sent: Thursday, September 13, 2018 5:04 AM > To: PIKAL Petr <petr.pi...@precheza.cz> > Subject: Re: [R] Correcting dates in research / medical record using R > > Hi Petr, > > Thank you for your help but I'm not sure what that code is supposed to do? > I'm > really new to regular expressions and am having difficulties with this whole > thing. > > Nic > > > > > On 9/12/2018 2:26 AM, PIKAL Petr wrote: > > Hi > > > > First of all you should not use HTML formated posts, it is big chance that > > it > gets scrambled. > > > > You should compare your ld2 after for cycle and result of > > > > clinicVdate[Id2, 'VisitDate'], sep=':') > > > > Most probably ld2 after for cycle does not conform to row names of > clinicVdate. > > > > Cheers > > Petr > > > > > >> -----Original Message----- > >> From: R-help <r-help-boun...@r-project.org> On Behalf Of Nicola > >> Cecchino > >> Sent: Wednesday, September 12, 2018 3:50 AM > >> To: R-help@r-project.org > >> Subject: [R] Correcting dates in research / medical record using R > >> > >> Hi, > >> > >> I'm not that well versed with R - I'm trying to correct the dates of > >> service in a de-identified research medical record of several subjects. > >> The correct dates come from a csv file, in the VisitDate column, > >> that looks like this in Excel. The empty cells have other data in > >> them that I don't need and the file name is DateR.csv: > >> > >> > >> Id1 Id2 > >> > >> > >> > >> > >> VisitDate > >> 12345 12345 > >> > >> > >> > >> > >> 4/3/2018 > >> > >> > >> The research medical record is a text file and the "DATE OF SERVICE" > >> in the top matter is in error in all of the subjects and needs to be > >> replaced with the "VisitDate" in the csv file. The file name for the > >> medical records is test3.NEW. Here is a screen grab of the top > >> matter of the research medical record; below this data excerpt is > >> other gathered data for that subject: > >> > >> > >> > =================================================================== > >> ============= > >> > >> PATIENT NAME: CONFIDENTIAL,#12345 > >> PATIENT ID #: 12345 > >> DATE OF SERVICE: 04/10/2018 > >> ACCESSION NUMBER: RR1234567 > >> > >> TEST PROCEDURE HIGH/LOW TEST RESULTS UNITS NORMAL VALUES > >> > >> > >> As described above, I need to update the text file DATE OF SERVICE: > >> date with the VisitDate in the csv file. > >> > >> I made several attempts at this to failure and so now I turn to you. > >> Here is the code that exhibits my attempts: > >> > >> > >> clinicVdate <- read.csv("DateR.csv") > >> > >> rownames(clinicVdate) <- as.character(clinicVdate[,'Id2']) > >> > >> Id2 <- NA > >> > >> input_data <- readLines("D:/test/test3.NEW") output_data <- c() > >> > >> for(input_line in input_data){ > >> output_line = input_line > >> if(length(grep('PATIENT ID #:', input_line))>0) { > >> Id2 = as.character(strsplit(input_line, ':')[[1]][2]) > >> } > >> > >> if (length(grep( 'DATE OF SERVICE: ', input_line))){ > >> > >> output_line = paste('DATE OF SERVICE', clinicVdate[Id2, > >> 'VisitDate'], sep=':') > >> > >> } > >> output_data = paste(output_data, output_line, sep='\n') } > >> > >> cat(output_data) > >> > >> > >> The results of the above remove the erroneous date and replace it > >> with NA. Here is an example of the results: > >> > >> > >> > =================================================================== > >> ============= > >> > >> PATIENT NAME: CONFIDENTIAL,#12345 > >> PATIENT ID #: 12345 > >> DATE OF SERVICE: NA > >> ACCESSION NUMBER: RR1234567 > >> > >> TEST PROCEDURE HIGH/LOW TEST RESULTS UNITS NORMAL VALUES > >> > >> > >> Where am I going wrong? If I didn't pose my question appropriately, > >> please let me know too!! Any help with this would be greatly appreciated!! > >> > >> Kind regards, > >> > >> Nic Cecchino > >> > >> > >> > >> > >> [[alternative HTML version deleted]] > >> > >> ______________________________________________ > >> R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see > >> https://stat.ethz.ch/mailman/listinfo/r-help > >> PLEASE do read the posting guide > >> http://www.R-project.org/posting-guide.html > >> and provide commented, minimal, self-contained, reproducible code. > > Osobní údaje: Informace o zpracování a ochraně osobních údajů > > obchodních partnerů PRECHEZA a.s. jsou zveřejněny na: > > https://www.precheza.cz/zasady-ochrany-osobnich-udaju/ | Information > > about processing and protection of business partner’s personal data > > are available on website: > > https://www.precheza.cz/en/personal-data-protection-principles/ > > Důvěrnost: Tento e-mail a jakékoliv k němu připojené dokumenty jsou > > důvěrné a podléhají tomuto právně závaznému prohláąení o vyloučení > > odpovědnosti: https://www.precheza.cz/01-dovetek/ | This email and any > > documents attached to it may be confidential and are subject to the > > legally binding disclaimer: https://www.precheza.cz/en/01-disclaimer/ > > Osobní údaje: Informace o zpracování a ochraně osobních údajů obchodních partnerů PRECHEZA a.s. jsou zveřejněny na: https://www.precheza.cz/zasady-ochrany-osobnich-udaju/ | Information about processing and protection of business partner’s personal data are available on website: https://www.precheza.cz/en/personal-data-protection-principles/ Důvěrnost: Tento e-mail a jakékoliv k němu připojené dokumenty jsou důvěrné a podléhají tomuto právně závaznému prohláąení o vyloučení odpovědnosti: https://www.precheza.cz/01-dovetek/ | This email and any documents attached to it may be confidential and are subject to the legally binding disclaimer: https://www.precheza.cz/en/01-disclaimer/ ______________________________________________ R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.