My only comment is should w or wb be used to open the temp file.  This
is related to the EOL issues I have been having between *nix and
Windows.

Thanks
CW

On 9/20/07, Thibaut Barrère <[EMAIL PROTECTED]> wrote:
> Hi,
>
> to finally cope with bulk load issues on MySQL (lost connection etc),
> I've added the ability to split the file into chunks. It works this
> way:
>
> post_process :bulk_import, { :file => bulk_file, :columns => target_fields,
>   :field_separator => ',', :target => CONFIG, :table => table,
> :rows_per_chunk => 10000 }
>
> rows_per_chunk defaults to false, which does not split the files at
> all (current behaviour).
>
> Is it interesting to others and should I commit this ? Any comments or
> remarks on naming or behaviour ? I'm pretty sure the code can be
> simplified (first version of the patch below if you care of the
> implementation details).
>
> cheers
> -- Thibaut
>
>
> @@ -21,6 +21,10 @@
>        attr_accessor :field_enclosure
>        # The line separator (defaults to a newline)
>        attr_accessor :line_separator
> +      # How many rows should be sent at a time (defaults to false =>
> all rows in one chunk)
> +      attr_accessor :rows_per_chunk
> +      # Chunk file name (defaults to file + '.chunk' )
> +      attr_accessor :chunk_file
>
>        # Initialize the processor.
>        #
> @@ -33,7 +37,9 @@
>        #   the bulk data file
>        # * <tt>:field_separator</tt>: The field separator. Defaults to a comma
>        # * <tt>:line_separator</tt>: The line separator. Defaults to a newline
> -      # * <tt>:field_enclosure</tt>: The field enclosure charcaters
> +      # * <tt>:field_enclosure</tt>: The field enclosure characters
> +      # * <tt>:rows_per_chunk</tt>: How many rows should be sent at a
> time (defaults to false => all rows in one chunk)
> +      # * <tt>:chunk_file</tt>: The chunk file name (defaults to file
> + '.chunk' ), when using lines_per_chunk
>        def initialize(control, configuration)
>          super
>          @file = File.join(File.dirname(control.file), configuration[:file])
> @@ -44,7 +50,8 @@
>          @field_separator = (configuration[:field_separator] || ',')
>          @line_separator = (configuration[:line_separator] || "\n")
>          @field_enclosure = configuration[:field_enclosure]
> -
> +        @rows_per_chunk = (configuration[:rows_per_chunk] || false)
> +        @chunk_file = (configuration[:chunk_file] || (@file + '.chunk' ))
>          raise ControlError, "Target must be specified" unless @target
>          raise ControlError, "Table must be specified" unless @table
>        end
> @@ -65,10 +72,34 @@
>              options[:fields][:enclosed_by] = field_enclosure if 
> field_enclosure
>              options[:fields][:terminated_by] = line_separator if 
> line_separator
>            end
> -          conn.bulk_load(file, table_name, options)
> +          split_into_chunks(file,rows_per_chunk) do |new_file,rows_count|
> +            puts "Bulk loading #{rows_count} rows..."
> +            conn.bulk_load(new_file, table_name, options)
> +          end
>          end
>        end
> -
> +
> +      # Split the file into rows_per_chunk, yield a temporary chunk
> filename each time
> +      def split_into_chunks(filename,rows_per_chunk)
> +        if rows_per_chunk
> +          File.open(filename) do |input|
> +            while not input.eof?
> +              rows_count = 0
> +              File.open(chunk_file,'w') do |chunk|
> +                while true
> +                  chunk << input.gets
> +                  rows_count += 1
> +                  break if (input.lineno % rows_per_chunk == 0) || 
> (input.eof?)
> +                end
> +              end
> +              yield chunk_file,rows_count
> +            end
> +          end
> +        else
> +          yield filename
> +        end
> +      end
> +
>        def table_name
>          ETL::Engine.table(table, ETL::Engine.connection(target))
>        end
> _______________________________________________
> Activewarehouse-discuss mailing list
> Activewarehouse-discuss@rubyforge.org
> http://rubyforge.org/mailman/listinfo/activewarehouse-discuss
>
_______________________________________________
Activewarehouse-discuss mailing list
Activewarehouse-discuss@rubyforge.org
http://rubyforge.org/mailman/listinfo/activewarehouse-discuss

Reply via email to