On Sun, Feb 22, 2009 at 12:15 PM, Etaoin Shrdlu <shr...@unlimitedmail.org> wrote: > On Sunday 22 February 2009, 20:06, Mark Knecht wrote: >> Hi, >> Very off topic other than I'd do this on my Gentoo box prior to >> using R on my Gentoo box. Please ignore if not of interest. >> >> I've got a really big data file in essentially a *.csv format. >> (comma delimited) I need to scan this file and create a new output >> file. I'm wondering if there is a reasonably easy command line way of >> doing this using something like sed or awk which I know nothing about. >> Thanks in advance. >> >> The basic idea goes something like this: >> >> 1) The input file might look this the following where some of it is >> attributes (shown as letters) and other parts are results. (shown as >> numbers) >> >> A,B,C,D,1 >> E,F,G,H,2 >> I,J,K,L,3 >> M,N,O,P,4 >> Q,R,S,T,5 >> U,V,W,X,6 > > Are the results always in the last field, and only a single field? > Is the total number of fields per line always fixed?
I don't know that for certain yet but I think the results will not always be in the last field. The total number of fields per line is always fixed in a given file but might change from file to file. If it does I'm willing to do minor edits (heck - I'll do major edits if I have to!!) to get it working. > >> 2) From the above data input file I want to take the attributes from a >> few preceeding lines (say 3 in this example) and write them to the >> output file along with the result on the last of the 3 lines. The >> output file might look like this: >> >> A,B,C,D,E,F,G,H,I,J,K,L,3 >> E,F,G,H,I,J,K,L,M,N,O,P,4 >> I,J,K,L,M,N,O,P,Q,R,S,T,5 >> M,N,O,P,Q,R,S,T,U,V,W,X,6 > > Is the number of lines you pick for the operation always 3 or can it > vary? And, once you choose a number n of lines, should the whole file be > processed concatenating n lines at a time, and the resulting single line > be ended with the result of the nth line? in other words, does the > following hold for the output format: > > <concatenation of attributes of lines 1..n> <result of line n> > <concatenation of attributes of lines 2..n+1> <result of line n+1> > <concatenation of attributes of lines 3..n+2> <result of line n+1> > <concatenation of attributes of lines 4..n+3> <result of line n+1> The above diagram is correct when the lines chosen is 3. I suspect that I might chose 10 or 15 lines once I get real data and do some testing but that was harder to show in this email. A good design for me would be a single variable I could set. Once a value is chosen I want to process every line in the input file the same way. I don't use 5 lines sometimes and 10 lines other times. In a given file it's always the same number of lines. > ... > > With answers to the above questions, it's probably possible to hack > together a solution. Thanks! - Mark