Re: Validating unique objects in CoreData

daniele malcom Sun, 14 Feb 2010 05:20:59 -0800

Thanks Roland, saving each x records drops searching time. That's
great. These are my benchmarks with 30k objects:


CoreData without saving each X insertions: about 5-6 minutes
CoreData with saving each 500 insertions: about 30 seconds
CoreData with auxiliary indexes dictionary: about 2 seconds

However that's seems to be strange.
According to:
http://cocoawithlove.com/2008/03/testing-core-data-with-very-big.html

it should be faster (very very fast) than that and 30k objects are few
objects for CoreData. I would to try to fill a bug at bugreporter and
listen what Apple Engineers says.
(a project that implement your idea is available here:
http://dl.dropbox.com/u/103260/CoreDataTreeTest3.zip)

On Sun, Feb 14, 2010 at 5:31 AM, Roland King <[email protected]> wrote:
> ok I downloaded your project. I agree with Jerry there's a memory leak,
> actually worse than that, you aren't actually remembering the article to set
> its parent if you create it, so
> [ DDArticle newArticleWithID: messageid context:ctx ];
> should be
> article = [ DDArticle newArticleWithID: messageid context:ctx ];
> [ article release ];
> I got the test to run in 30 seconds, which isn't too bad as just looping
> over the articles takes about 7 seconds itself. Here's your problem, you're
> never saving the work, so you are building up all the articles you're adding
> in memory. Yes the SQL store has an index on it and yes coredata is issuing
> the correct select command but .. there's nothing in the store. So as well
> as looking in the store, it also has to scan every one of the objects still
> waiting to be persisted. Clearly even though it uses an index on the SQL, it
> doesn't use the index hint to build an in-memory map for finding the
> in-memory objects which match a predicate. So yes your adds go slower and
> slower and slower as core data each time does one SQL lookup in an always
> empty database which finds 0 objects in 0.0005 of a second, then goes
> scanning an increasing set of pending objects one by one. Since you never
> match as your IDs are unique, it scans the whole set every time. If you log
> it you'll see it adding slower and slower each iteration.
> So I tried adding in [ archive save ] to make it commit and was surprised to
> find nothing changed, until I realized that [ archive save ] saves the wrong
> context, in fact your example code never saves anything to the DB at all!
> Adding this in inside your add loop
> if( [ [ ctx updatedObjects ] count ] > 100 )
> [ ctx save:nil ];
> means the working set is never larger than 100, so that limits the amount of
> in-memory lookup, once the objects are cached in the DB, the SQL lookup
> piece is blisteringly quick, so your check for existing objects runs in
> nearly constant time. 100 is a parameter you can tweak, you could just save
> every single time but that probably has overhead, if you make it much larger
> than 100 you have the save overhead less often but you have to scan more
> in-memory objects, it's a compromise.
> 1000 checks and inserts a second seems .. about ok to me and if you make
> sure and save the context regularly, you should be able to keep that rate up
> even as the database size grows.
> On 14-Feb-2010, at 5:51 AM, daniele malcom wrote:
>
> Hi Roland, in fact indices table exists (for DDArticle entity):
> Enter SQL statements terminated with a ";"
> sqlite> .tables
> ZDDARTICLE    Z_METADATA    Z_PRIMARYKEY
> sqlite> .indices ZDDARTICLE
> ZDDARTICLE_ZMESSAGEID_INDEX
> ZDDARTICLE_ZPARENT_INDEX
>
> With my macbook pro insertion of 30k articles took about 2/3 minutes.
> I've uploaded a test project:
> http://dl.dropbox.com/u/103260/CoreDataTreeTest.zip
> I really don't know why it should take this long time but using
> Instruments the big part is obviously fetch for searching id and
> parent.
>
> On Sat, Feb 13, 2010 at 2:53 PM, Roland King <[email protected]> wrote:
>
> .. oh and one other thing, there's a core data instruments tool in XCode,
> well there is for OSX, not for iPhoneOS which I develop for which may be why
> I never saw it before. You could try that.
>
> On 13-Feb-2010, at 9:36 PM, Roland King wrote:
>
> ok, I don't see anything wrong with the predicate code, but I'm no core data
> expert.
>
> I'll make one totally challengable statement. Assuming that core data uses
> sqllite in a rational way to store objects (eg not storing everything as
> blobs of opaque data) for instance one table per entity where each column of
> the table is an attribute and evaluating the predicate does what you would
> expect it to do, ie uses SQL to do as much of the heavy lifting on a fetch
> request as possible, that column is indexed in the table and sqllite is
> using the index; taking multi-minutes to find one row out of 20,000 just
> doesn't make any sense, it should take seconds at most.
>
> I believe core data does use table-per-entity. I think that partly because
> the documentation hints at it, partly because it makes sense and partly
> because I looked at the implementation of one data model that I have.
>
> I can't see the point of making indexes if the predicate code doesn't
> generate SQL which doesn't use them, but it's possible. It's possible that
> core data goes and loads all the entity rows and inspects their attributes
> by hand and filters them in code, but this is apple not microsoft.
>
> So that leaves column isn't indexed as the most likely. But you've checked
> the 'indexed' box. Here's another wild assed guess, does coredata only
> create a store when you have no current store? It certainly checks to see if
> the store is compatible with the model but as the indexed property is just a
> hint anyway, that store is compatible, just non-optimal .. it's possible if
> you created the store with the property defined as not-indexed and have just
> checked that box later, without regenerating the whole store, the index was
> never added. Did you do that, just check it later? Have you regenerated a
> complete new store since or are you using a store you've been populating for
> a while.
>
> Here's a particularly ugly idea, purists please stop reading now. We can
> look at the store and see if it has an index on that property ... first get
> up a terminal window and go to the path where your store is. I'm assuming
> you have sqlite3 installed like I do .. it came with the OS as far as I
> know.
>
> Your store should be called something.sqlite, let's say it's Foo. Type
>
>       sqlite3 Foo.sqlite
>
> and that should open the store and give you a prompt. First you want to find
> the tables in the store, so type
>
>       .tables
>
> as far as I can see they are called Z<YOUR ENTITY NAME>, so for you I'd
> expect to see one of the tables called ZMCARTICLE. If there is one, you can
> find out what indices are on it
>
>       .indices ZMCARTICLE
>
> I believe again the indices are called Z<YOUR ENTITY NAME>_Z<YOUR ATTRIBUTE
> NAME>_INDEX, so you'd expect to find ZMCARTICLE_ZMESSAGEID_INDEX in that
> list. If you don't have it, the store wasn't created with that index. If
> none of those tables exist at all, my rudimentary reverse engineering of the
> whole coredata thing is flawed (or I'm using some entirely different version
> from you).
>
> If the tables and indices exist, including the one on ZMESSAGEID, I'm out of
> ideas unless someone knows of a way to put coredata into a form of debug
> mode and see the SQL generated to figure out if it's doing anything smart.
>
> If either none of the above works or it does work but you don't have the
> index, you have a couple of options. The right one is to delete your whole
> message store and run your app and make a brand new one to see if that then
> adds the indexed property with an index. Depending on how you've populated
> the store, that might be a real pain, perhaps you can force a migration or
> something. The other really stupid idea would be to just add the index and
> hope that doesn't break everything entirely which is entirely possible at
> which point you delete the store and start over. You would do that by
> running
>
>       CREATE INDEX ZMCARTICLE_ZMESSAGEID_INDEX ON ZMCARTICLE (ZMESSAGEID);
>
> Here's another useful thing I just came across, I would certainly run this
> to see if the SQL being executed makes sense.
>
>
> With Mac OS X version 10.4.3 and later, you can use the user default
> com.apple.CoreData.SQLDebug to log to stderr the actual SQL sent to SQLite.
> (Note that user default names are case sensitive.) For example, you can pass
> the following as an argument to the application:
>
> -com.apple.CoreData.SQLDebug 1
>
> Higher levels of debug numbers produce more information, although using
> higher numbers is likely to be of diminishing utility.
>
>
>
> I'd love to hear about any other ways people have to debug coredata. I sort
> of trust apple has done a good job with it and for it to break down
> performance wise on looking for a row in 20,000 with a certain attribute
> doesn't make sense to me. If you really can't get it to work, I'd write a
> short project which inserts 20,000 simple objects into a store and another
> one which opened the store and goes looking for one by attribute in the way
> you have. If it takes multi-minutes, I'd sent it to apple as a bug.
>
>
_______________________________________________

Cocoa-dev mailing list ([email protected])

Please do not post admin requests or moderator comments to the list.
Contact the moderators at cocoa-dev-admins(at)lists.apple.com

Help/Unsubscribe/Update your Subscription:
http://lists.apple.com/mailman/options/cocoa-dev/archive%40mail-archive.com

This email sent to [email protected]

Re: Validating unique objects in CoreData

Reply via email to