-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1
Attached is a patch with the following API change for
ActiveRecord::Base.create.
Currently ActiveRecord::Base.create takes either a hash of attributes or
an array of hashes of attributes. If you pass in an array it will treat
each hash individually and create/save each object separately. This is
really slow because it invokes a single INSERT statement for each hash
of attributes. The current class method 'create' also returns a model
object or an array of model objects for the hash(es) that were passed in.
This patch updates ActiveRecord::Base.create to take a third argument
which is an options hash. Have this take an :optimize key which can
point to one of the values: 'fast' or 'fastest'
'fast' would use the minimum amount of INSERT statements to create the
values and would still use the current validations/callback methods
'fastest' would use the minimum amount of INSERT statements and would
return the number of INSERT statements issued. It would not create any
model objects, and it would skip all validations/callbacks.
Currently 'fastest' is implemented in the patch. I expect to have 'fast'
done by midweek but I'd like some peer review and core acceptance before
I focus on doing to many more patches for this feature.
This also involves a patch to MysqlAdapter (and it will require a patch
for any other adapter to support this optimization) to support
intelligent communication with the database server to determine server
values like the maximum allowed packet size (which is used to create the
minimum number of INSERT statements).
Since dev.rubyonrails.com is down I figured I post this here. Below are
some benchmarking statistic.
Creation of 100 MyISAM records using AR now took 0.92 seconds
Creation of 100 MyISAM records using :optimize=>'fastest' took 0.10 seconds
Creation of 1000 MyISAM records using AR now took 5.41 seconds
Creation of 1000 MyISAM records using :optimize=>'fastest' took 0.17 seconds
Creation of 10000 MyISAM records using AR now took 57.46 seconds
Creation of 10000 MyISAM records using :optimize=>'fastest' took 2.45
seconds
The benchmarks for Memory and InnoDb are just as good (although InnoDb
does get a better speedup).
The change in code for the user looks like:
MyModel.create array_of_hashes, :optimize=>'fastest'
This is a patch off from trunk tonight, and it includes unit tests for
everything.
Could someone check this out and give some feedback? I have started to
develop this as a separate plugin for ActiveRecord, but the more I think
about it the more it seems that this sort of thing should be in core.
Thoughts?
Zach
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.2.2 (GNU/Linux)
Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org
iD8DBQFEuuhyMyx0fW1d8G0RAiDTAJ9GceqhS2lyR4OL+FrajQjenLmPOACfVpia
Z4tSCY8kl/kncnbRe6CIuHE=
=4hcE
-----END PGP SIGNATURE-----
Index:
/home/zdennis/source/local_projects/rails_trunk/activerecord/lib/active_record/base.rb
===================================================================
---
/home/zdennis/source/local_projects/rails_trunk/activerecord/lib/active_record/base.rb
(revision 4610)
+++
/home/zdennis/source/local_projects/rails_trunk/activerecord/lib/active_record/base.rb
(working copy)
@@ -431,9 +431,23 @@
# Creates an object, instantly saves it as a record (if the validation
permits it), and returns it. If the save
# fails under validations, the unsaved object is still returned.
- def create(attributes = nil)
- if attributes.is_a?(Array)
+ # Creates and returns an object, given a hash of attributes and
instantly saves it as
+ # a record (if the validation permits it). If the save fails under
validations, the unsaved object is still around.
+ #
+ # Creates and returns an array of object, given an array of hash
attributes. By default
+ # this will validate each object created. If an object's save fails
under validations,
+ # the unsaved object is still around. (This is the default behavior)
+ #
+ # Creates and returns an array of objects given an array of hash
attributes, and passing in a
+ # hash of options. The options hash can include:
+ # * :optimize=>'best' - this will skip all validations and callbacks
and perform an optimized \
+ # insert operation given that the database adapter supports this
optimization. (Currently \
+ # only the Mysql adapter supports this feature).
+ def create(attributes = nil, *args )
+ if attributes.is_a?(Array) and args.empty?
attributes.collect { |attr| create(attr) }
+ elsif attributes.is_a?(Array) and args.first.is_a?( Hash )
+ create_many_with_options( attributes, args.first )
else
object = new(attributes)
scope(:create).each { |att,value| object.send("#{att}=", value) } if
scoped?(:create)
@@ -440,6 +454,40 @@
object.save
object
end
+ end
+
+ def create_many_with_options( objects, options ) # :nodoc:
+ if options[:optimize].to_s =~ /^fast$/
+ create_many( objects, options )
+ # elsif options[:optimize].to_s =~ /fastest$/
+ else
+ create_many_without_validations_or_callbacks( objects, options )
+ end
+ end
+
+ # TODO currently not supported, defaults to
create_many_without_validations_or_callbacks
+ def create_many( objects, options={} ) # :nodoc
+ create_many_without_validations_or_callbacks( objects, options )
+ end
+
+ # Returns the number of insert operations it took to create these
records without
+ # validations or callbacks. The underlying database adapter has to
support
+ # the method insert_many.
+ def create_many_without_validations_or_callbacks( objects, options ) #
:nodoc:
+ column_names = objects.first.keys
+ column_names_sql = column_names.map{ |c| connection.quote_column_name(
c ) }
+
+ values = objects.map{ |hsh|
+ column_names.map{ |colname|
+ connection.quote( hsh[colname], columns_hash[colname] ) } }
+ values_sql = values.map{ |arr| '(' + arr.join( ',' ) + ')' }
+
+ base_sql = "INSERT INTO #{table_name} (#{column_names_sql.join(', ')})
VALUES"
+ number_of_inserts = connection.insert_many(
+ base_sql,
+ values_sql,
+ "#{self.class.name} Create Without Validation" )
+ number_of_inserts
end
# Finds the record from the passed +id+, instantly saves it with the
passed +attributes+ (if the validation permits it),
Index:
/home/zdennis/source/local_projects/rails_trunk/activerecord/lib/active_record/connection_adapters/mysql_adapter.rb
===================================================================
---
/home/zdennis/source/local_projects/rails_trunk/activerecord/lib/active_record/connection_adapters/mysql_adapter.rb
(revision 4610)
+++
/home/zdennis/source/local_projects/rails_trunk/activerecord/lib/active_record/connection_adapters/mysql_adapter.rb
(working copy)
@@ -197,6 +197,35 @@
execute(sql, name = nil)
id_value || @connection.insert_id
end
+
+ def insert_many( base_sql, values, *args )
+ # the number of inserts default
+ number_of_inserts = 0
+
+ # the number of bytes the requested insert statement values will take
up
+ values_in_bytes = self.class.sum_sizes( *values )
+
+ # the number of bytes (commas) it will take to comma separate our
values
+ comma_separated_bytes = values.size-1
+
+ # the total number of bytes required if this statement is one statement
+ total_bytes = base_sql.size + values_in_bytes + comma_separated_bytes
+
+ max = max_allowed_packet
+ # if we can insert it all as one statement
+ if total_bytes < max
+ number_of_inserts += 1
+ insert( base_sql + values.join( ',' ), *args )
+ else
+ value_sets = self.class.get_insert_value_sets( values,
base_sql.size, max )
+ value_sets.each do |values|
+ number_of_inserts += 1
+ insert( base_sql + values.join( ',' ), *args )
+ end
+ end
+
+ number_of_inserts
+ end
def update(sql, name = nil) #:nodoc:
execute(sql, name)
@@ -324,7 +353,41 @@
current_type = select_one("SHOW COLUMNS FROM #{table_name} LIKE
'#{column_name}'")["Type"]
execute "ALTER TABLE #{table_name} CHANGE #{column_name}
#{new_column_name} #{current_type}"
end
-
+
+ # MISC =================================================================
+
+ # Returns the sum of the sizes of the passed in objects. This should
+ # probably be moved outside this class, but to where?
+ def self.sum_sizes( *objects ) # :nodoc:
+ objects.inject( 0 ){|sum,o| sum += o.size }
+ end
+
+ # Returns the maximum number of bytes that the server will allow
+ # in a single packet
+ def max_allowed_packet # :nodoc:
+ result = execute( "SHOW VARIABLES like 'max_allowed_packet';" )
+ result.fetch_row[1].to_i
+ end
+
+ def self.get_insert_value_sets( values, sql_size, max_bytes )
+ value_sets = []
+ arr, current_arr_values_size, current_size = [], 0, 0
+ values.each_with_index do |val,i|
+ comma_bytes = arr.size-1
+ if sql_size + current_size + val.size + comma_bytes < max_bytes
+ current_size += val.size
+ arr << val
+ else
+ value_sets << arr
+ arr = [ val ]
+ current_size = val.size
+ end
+
+ # if we're on the last iteration push whatever we have in arr to
value_sets
+ value_sets << arr if i == (values.size-1)
+ end
+ [ *value_sets ]
+ end
private
def connect
Index:
/home/zdennis/source/local_projects/rails_trunk/activerecord/test/adapter_test_mysql.rb
===================================================================
---
/home/zdennis/source/local_projects/rails_trunk/activerecord/test/adapter_test_mysql.rb
(revision 0)
+++
/home/zdennis/source/local_projects/rails_trunk/activerecord/test/adapter_test_mysql.rb
(revision 0)
@@ -0,0 +1,45 @@
+
+class MysqlAdapterTest < Test::Unit::TestCase
+ include ActiveRecord::ConnectionAdapters
+
+ def setup
+ @connection = ActiveRecord::Base.connection
+ end
+
+ def test_get_insert_value_sets
+ values = [
+ "('1','2','3')",
+ "('4','5','6')",
+ "('7','8','9')" ]
+
+ values_size_in_bytes = MysqlAdapter.sum_sizes( *values )
+ base_sql_size_in_bytes = 15
+ max_bytes = 30
+
+ value_sets = MysqlAdapter.get_insert_value_sets( values,
base_sql_size_in_bytes, max_bytes )
+ assert_equal 3, value_sets.size, 'Three value sets were expected!'
+
+ # Each element in the value_sets array must be an array
+ value_sets.each_with_index { |e,i|
+ assert_kind_of Array, e, "Element #{i} was expected to be an Array!" }
+
+ # Each element in the values array should have a 1:1 correlation to the
elements
+ # in the returned value_sets arrays
+ assert_equal values[0], value_sets[0].first
+ assert_equal values[1], value_sets[1].first
+ assert_equal values[2], value_sets[2].first
+ end
+
+ def test_insert_many
+ base_sql = "INSERT INTO #{Topic.table_name} (`title`,`author_name`) VALUES
"
+ values = [
+ "('Morgawr','Brooks, Terry')",
+ "('Antrax', 'Brooks, Terry')",
+ "('Jarka Ruus', 'Brooks, Terry')" ]
+
+ current_number_of_topic_records = Topic.count
+ @connection.insert_many( base_sql, values )
+ assert_equal current_number_of_topic_records + 3, Topic.count, "Incorrect
number of records in the database!"
+ end
+
+end
\ No newline at end of file
Index:
/home/zdennis/source/local_projects/rails_trunk/activerecord/test/base_test.rb
===================================================================
---
/home/zdennis/source/local_projects/rails_trunk/activerecord/test/base_test.rb
(revision 4610)
+++
/home/zdennis/source/local_projects/rails_trunk/activerecord/test/base_test.rb
(working copy)
@@ -138,7 +138,33 @@
topic_reloaded = Topic.find(topic.id)
assert_equal("New Topic", topic_reloaded.title)
end
+
+ def test_create_many1
+ topic_hash = { :title => 'New Topic' }
+ topics_array = [ topic_hash, topic_hash, topic_hash ]
+ topic_count = Topic.count
+ number_of_expected_topics = topics_array.size + topic_count
+
+ Topic.create topics_array
+ assert_equal number_of_expected_topics, Topic.count
+
+ number_of_expected_topics += topics_array.size
+ Topic.create_many topics_array, :optimize=>'fastest'
+ assert_equal number_of_expected_topics, Topic.count
+ end
+
+ def test_create_many_with_options
+ topic_hash = { :title => 'New Topic' }
+ topics_array = [ topic_hash, topic_hash, topic_hash ]
+
+ topic_count = Topic.count
+ number_of_expected_topics = topics_array.size + topic_count
+
+ Topic.create_many_with_options topics_array, :optimize=>'fastest'
+ assert_equal number_of_expected_topics, Topic.count
+ end
+
def test_save!
topic = Topic.new(:title => "New Topic")
assert topic.save!
_______________________________________________
Rails-core mailing list
[email protected]
http://lists.rubyonrails.org/mailman/listinfo/rails-core