Hi All, I need some help to get started on a script.
I have these huge data files 16K rows and several columns. I need to parse the rows into a subset of these 16K rows. Each rows has a identifier made up of 2 letters and 6 numbers and the ones I want have specific letter, they start with either C or D. So I know I can use regex, but I have been trying to figure out the rest and I don't know where to start. This is the first time I am trying to do something from scratch so any suggestions would be appreciated. I am not asking for the script but just some help on how to go about it. So, what I want to be able to do is retrieve all the rows that have identifiers starting with C or D. Should I use arrays, can I store each row as one item a (set of information separated by tabs) in an array? Here is an example of how the file looks like. So I would like to use the Gene ID field to parse it. Field Meta Row Meta Column Row Column Gene ID Annotation 1 Flag Signal Mean Background Mean Signal Median Background Median Signal Mode Background Mode Signal Area Background Area Signal Total A 2 1 9 9 AA067532 Arabidopsis Negative Control 2 352.9428 203.4924 77 1 168.1093 55.8592 70 329 24706 A 2 1 9 10 AA067532 Arabidopsis Negative Control 2 352.4057 213.3951 99 1 44.659 48.423 69 329 24316 Thanks, Tiago -- "Education is not to be used to promote obscurantism." - Theodonius Dobzhansky. "Gracias a la vida que me ha dado tanto Me ha dado el sonido y el abecedario Con él, las palabras que pienso y declaro Madre, amigo, hermano Y luz alumbrando la ruta del alma del que estoy amando Gracias a la vida que me ha dado tanto Me ha dado la marcha de mis pies cansados Con ellos anduve ciudades y charcos Playas y desiertos, montañas y llanos Y la casa tuya, tu calle y tu patio" Violeta Parra - Gracias a la Vida Tiago S. F. Hori PhD Candidate - Ocean Science Center-Memorial University of Newfoundland