On Tuesday 24 February 2009, 23:51, Mark Knecht wrote:

> Looks like I'm running into one more problem and then I'm ready to
> give it a try for real. Unfortunately one vendor platform is putting
> quotes around the names in the header row so your _N increment looks
> like "High"_4 instead of High_4 or "High_4". I'd like to fix that as
> I'm pretty sure that the way we have it won't be acceptable, but I
> don't know whether it would be best to have the quotes or not have the
> quotes. My two target data mining platforms are R, which is in
> portage, and RapidMiner which is available as Open Source from the
> Rapid-i web site. I'll try it both ways with both header formats and
> see what happens.

Ok, in any case that is a minor fix and adjusting the program is no big 
deal.

> I had worried about checking the header on a really large file to see
> if I had cut the correct columns but it turns out that
>
> cat awkDataOut.csv | more
>
> in a terminal writes the first few lines very quickly. From there I
> can either just look at it or copy/paste into a new csv file, load it
> into something like Open Office Calc and make sure I got the right
> columns so I don't think there's any practical need to do anything
> more with the header other than whatever turns out to be the right
> answer with the quotes. My worry had been that when I request 5 data
> columns it's not obvious what order they are provided so I'd have to
> look at the file and figure out where everything was. Turns out it's
> not such a big deal.

The currently implemented rule is as follows: if you request to not have, 
say, columns 2 and 5 (out of, say, a total of 5 in the original file - 
besides date/time and result), you get the columns in the same order 
they were in the original file, minus the ones you don't want, so in 
this example you will get columns 1, 3 and 4 in that order.

Reply via email to