Hi, to finally cope with bulk load issues on MySQL (lost connection etc), I've added the ability to split the file into chunks. It works this way:
post_process :bulk_import, { :file => bulk_file, :columns => target_fields, :field_separator => ',', :target => CONFIG, :table => table, :rows_per_chunk => 10000 } rows_per_chunk defaults to false, which does not split the files at all (current behaviour). Is it interesting to others and should I commit this ? Any comments or remarks on naming or behaviour ? I'm pretty sure the code can be simplified (first version of the patch below if you care of the implementation details). cheers -- Thibaut @@ -21,6 +21,10 @@ attr_accessor :field_enclosure # The line separator (defaults to a newline) attr_accessor :line_separator + # How many rows should be sent at a time (defaults to false => all rows in one chunk) + attr_accessor :rows_per_chunk + # Chunk file name (defaults to file + '.chunk' ) + attr_accessor :chunk_file # Initialize the processor. # @@ -33,7 +37,9 @@ # the bulk data file # * <tt>:field_separator</tt>: The field separator. Defaults to a comma # * <tt>:line_separator</tt>: The line separator. Defaults to a newline - # * <tt>:field_enclosure</tt>: The field enclosure charcaters + # * <tt>:field_enclosure</tt>: The field enclosure characters + # * <tt>:rows_per_chunk</tt>: How many rows should be sent at a time (defaults to false => all rows in one chunk) + # * <tt>:chunk_file</tt>: The chunk file name (defaults to file + '.chunk' ), when using lines_per_chunk def initialize(control, configuration) super @file = File.join(File.dirname(control.file), configuration[:file]) @@ -44,7 +50,8 @@ @field_separator = (configuration[:field_separator] || ',') @line_separator = (configuration[:line_separator] || "\n") @field_enclosure = configuration[:field_enclosure] - + @rows_per_chunk = (configuration[:rows_per_chunk] || false) + @chunk_file = (configuration[:chunk_file] || (@file + '.chunk' )) raise ControlError, "Target must be specified" unless @target raise ControlError, "Table must be specified" unless @table end @@ -65,10 +72,34 @@ options[:fields][:enclosed_by] = field_enclosure if field_enclosure options[:fields][:terminated_by] = line_separator if line_separator end - conn.bulk_load(file, table_name, options) + split_into_chunks(file,rows_per_chunk) do |new_file,rows_count| + puts "Bulk loading #{rows_count} rows..." + conn.bulk_load(new_file, table_name, options) + end end end - + + # Split the file into rows_per_chunk, yield a temporary chunk filename each time + def split_into_chunks(filename,rows_per_chunk) + if rows_per_chunk + File.open(filename) do |input| + while not input.eof? + rows_count = 0 + File.open(chunk_file,'w') do |chunk| + while true + chunk << input.gets + rows_count += 1 + break if (input.lineno % rows_per_chunk == 0) || (input.eof?) + end + end + yield chunk_file,rows_count + end + end + else + yield filename + end + end + def table_name ETL::Engine.table(table, ETL::Engine.connection(target)) end _______________________________________________ Activewarehouse-discuss mailing list Activewarehouse-discuss@rubyforge.org http://rubyforge.org/mailman/listinfo/activewarehouse-discuss