My only comment is should w or wb be used to open the temp file. This is related to the EOL issues I have been having between *nix and Windows.
Thanks CW On 9/20/07, Thibaut Barrère <[EMAIL PROTECTED]> wrote: > Hi, > > to finally cope with bulk load issues on MySQL (lost connection etc), > I've added the ability to split the file into chunks. It works this > way: > > post_process :bulk_import, { :file => bulk_file, :columns => target_fields, > :field_separator => ',', :target => CONFIG, :table => table, > :rows_per_chunk => 10000 } > > rows_per_chunk defaults to false, which does not split the files at > all (current behaviour). > > Is it interesting to others and should I commit this ? Any comments or > remarks on naming or behaviour ? I'm pretty sure the code can be > simplified (first version of the patch below if you care of the > implementation details). > > cheers > -- Thibaut > > > @@ -21,6 +21,10 @@ > attr_accessor :field_enclosure > # The line separator (defaults to a newline) > attr_accessor :line_separator > + # How many rows should be sent at a time (defaults to false => > all rows in one chunk) > + attr_accessor :rows_per_chunk > + # Chunk file name (defaults to file + '.chunk' ) > + attr_accessor :chunk_file > > # Initialize the processor. > # > @@ -33,7 +37,9 @@ > # the bulk data file > # * <tt>:field_separator</tt>: The field separator. Defaults to a comma > # * <tt>:line_separator</tt>: The line separator. Defaults to a newline > - # * <tt>:field_enclosure</tt>: The field enclosure charcaters > + # * <tt>:field_enclosure</tt>: The field enclosure characters > + # * <tt>:rows_per_chunk</tt>: How many rows should be sent at a > time (defaults to false => all rows in one chunk) > + # * <tt>:chunk_file</tt>: The chunk file name (defaults to file > + '.chunk' ), when using lines_per_chunk > def initialize(control, configuration) > super > @file = File.join(File.dirname(control.file), configuration[:file]) > @@ -44,7 +50,8 @@ > @field_separator = (configuration[:field_separator] || ',') > @line_separator = (configuration[:line_separator] || "\n") > @field_enclosure = configuration[:field_enclosure] > - > + @rows_per_chunk = (configuration[:rows_per_chunk] || false) > + @chunk_file = (configuration[:chunk_file] || (@file + '.chunk' )) > raise ControlError, "Target must be specified" unless @target > raise ControlError, "Table must be specified" unless @table > end > @@ -65,10 +72,34 @@ > options[:fields][:enclosed_by] = field_enclosure if > field_enclosure > options[:fields][:terminated_by] = line_separator if > line_separator > end > - conn.bulk_load(file, table_name, options) > + split_into_chunks(file,rows_per_chunk) do |new_file,rows_count| > + puts "Bulk loading #{rows_count} rows..." > + conn.bulk_load(new_file, table_name, options) > + end > end > end > - > + > + # Split the file into rows_per_chunk, yield a temporary chunk > filename each time > + def split_into_chunks(filename,rows_per_chunk) > + if rows_per_chunk > + File.open(filename) do |input| > + while not input.eof? > + rows_count = 0 > + File.open(chunk_file,'w') do |chunk| > + while true > + chunk << input.gets > + rows_count += 1 > + break if (input.lineno % rows_per_chunk == 0) || > (input.eof?) > + end > + end > + yield chunk_file,rows_count > + end > + end > + else > + yield filename > + end > + end > + > def table_name > ETL::Engine.table(table, ETL::Engine.connection(target)) > end > _______________________________________________ > Activewarehouse-discuss mailing list > Activewarehouse-discuss@rubyforge.org > http://rubyforge.org/mailman/listinfo/activewarehouse-discuss > _______________________________________________ Activewarehouse-discuss mailing list Activewarehouse-discuss@rubyforge.org http://rubyforge.org/mailman/listinfo/activewarehouse-discuss