Re: [gentoo-user] [OT] - command line read *.csv & create new file

Mark Knecht Tue, 24 Feb 2009 06:42:02 -0800

On Tue, Feb 24, 2009 at 2:56 AM, Etaoin Shrdlu <shr...@unlimitedmail.org> wrote:
<SNIP>
>
> So, in my understanding this is what we want to accomplish so far:
>
> given an input of the form
>
> D1,T1,a1,b1,c1,d1,...,R1
> D2,T2,a2,b2,c2,d2,...,R2
> D3,T3,a3,b3,c3,d3,...,R3
> D4,T4,a4,b4,c4,d4,...,R4
> D5,T5,a5,b5,c5,d5,...,R5
>
> (the ... mean that an  arbitrary number of columns can follow)
>
> You want to group lines by n at a time, keeping the D and T column from
> the first line of each group, and keeping the R column from the last
> line of the group, so for example with n=3 we would have:
>
> D1,T1,a1,b1,c1,d1,...a2,b2,c2,d2,...a3,b3,c3,d3,...R3
> D1,T1,a2,b2,c2,d2,...a3,b3,c3,d3,...a4,b4,c4,d4,...R4
> D1,T1,a3,b3,c3,d3,...a4,b4,c4,d4,...a5,b5,c5,d5,...R5
>
> (and you're right, that produces an output that is roughly n times the
> size of the original file)
>
> Now, in addition to that, you also want to drop an arbitrary number of
> columns in the a,b,c... group. So for example, you want to drop columns
> 2 and 3 (b and c in the example), so you'd end up with something like
>
> D1,T1,a1,d1,...a2,d2,...a3,d3,...R3
> D1,T1,a2,d2,...a3,d3,...a4,d4,...R4
> D1,T1,a3,d3,...a4,d4,...a5,d5,...R5
>
> Please confirm that my understanding is correct, so I can come up with
> some code to do that.


Perfectly correct for all the data rows.

For the header I now see that we have a slightly harder job. What we'd
need to do is read the first line of the file, duplicate it N times,
and then drop the same columns as we drop in the rows. The problem is
that now I have the same header value for N columns which won't make
sense to the tool that uses this data. If we could read the header and
then automatically postpend the number N to each duplicated name. (or
some string like _N)

Maybe better would be a separate small program to do the header part
and then this program could read that header and make it the first
line of the output file. My worry is that when this data file becomes
very large - say 1GB or more of data - I probably cannot open the file
with vi to edit the header. Better if I could put the header in it's
own file. That file would be 1 line long. I could check it for the
name edits, make sure it's right, and then the program you are so
kindly building would just read it, cut out columns, and put it at the
start of the new large file.

Does that make sense?

>
>> I found a web site to study awk so I'm starting to see more or less
>> how your example works when I have the code in front of me. Creating
>> the code out of thin air might be a bit of a stretch for me at this
>> point though.
>
> I suggest you start from
>
> http://www.gnu.org/software/gawk/manual/gawk.html
>
> really complete, but gradual so you can have an easy start and move on to
> the complexities later.
>
>

Yes, very complete. A good reference. Thanks!

Cheers,
Mark

Re: [gentoo-user] [OT] - command line read *.csv & create new file

Reply via email to