Hi,

very often while doing maintenance work on a datawarehouse using CSV
files as input, I need to keep only the first line (headers) and any
line matching a given regexp, in order to save some time.

Here's a little helper do to this:

def keep_headers_and_matching_lines(filename,regexp)
  tempfilename = filename + ".tmp"
  FileUtils.mv(filename,tempfilename)
  File.open(tempfilename) do |input|
    File.open(filename,'w') do |output|
      input.each_with_index do |line,index|
        output << line if (line =~ regexp || index == 0)
      end
    end
  end
end

Typical use:

preprocess { keep_headers_and_matching_lines('mydata.csv',/customer/i) }

(sure that can be done also with a grep call - and that would be faster as well)

in case it's useful to someone else !

cheers,

Thibaut Barrère
--
[blog] http://evolvingworker.com - tools for a better day
[blog] http://blog.logeek.fr - about writing software
_______________________________________________
Activewarehouse-discuss mailing list
Activewarehouse-discuss@rubyforge.org
http://rubyforge.org/mailman/listinfo/activewarehouse-discuss

Reply via email to