Thanks Roland, saving each x records drops searching time. That's great. These are my benchmarks with 30k objects:
CoreData without saving each X insertions: about 5-6 minutes CoreData with saving each 500 insertions: about 30 seconds CoreData with auxiliary indexes dictionary: about 2 seconds However that's seems to be strange. According to: http://cocoawithlove.com/2008/03/testing-core-data-with-very-big.html it should be faster (very very fast) than that and 30k objects are few objects for CoreData. I would to try to fill a bug at bugreporter and listen what Apple Engineers says. (a project that implement your idea is available here: http://dl.dropbox.com/u/103260/CoreDataTreeTest3.zip) On Sun, Feb 14, 2010 at 5:31 AM, Roland King <[email protected]> wrote: > ok I downloaded your project. I agree with Jerry there's a memory leak, > actually worse than that, you aren't actually remembering the article to set > its parent if you create it, so > [ DDArticle newArticleWithID: messageid context:ctx ]; > should be > article = [ DDArticle newArticleWithID: messageid context:ctx ]; > [ article release ]; > I got the test to run in 30 seconds, which isn't too bad as just looping > over the articles takes about 7 seconds itself. Here's your problem, you're > never saving the work, so you are building up all the articles you're adding > in memory. Yes the SQL store has an index on it and yes coredata is issuing > the correct select command but .. there's nothing in the store. So as well > as looking in the store, it also has to scan every one of the objects still > waiting to be persisted. Clearly even though it uses an index on the SQL, it > doesn't use the index hint to build an in-memory map for finding the > in-memory objects which match a predicate. So yes your adds go slower and > slower and slower as core data each time does one SQL lookup in an always > empty database which finds 0 objects in 0.0005 of a second, then goes > scanning an increasing set of pending objects one by one. Since you never > match as your IDs are unique, it scans the whole set every time. If you log > it you'll see it adding slower and slower each iteration. > So I tried adding in [ archive save ] to make it commit and was surprised to > find nothing changed, until I realized that [ archive save ] saves the wrong > context, in fact your example code never saves anything to the DB at all! > Adding this in inside your add loop > if( [ [ ctx updatedObjects ] count ] > 100 ) > [ ctx save:nil ]; > means the working set is never larger than 100, so that limits the amount of > in-memory lookup, once the objects are cached in the DB, the SQL lookup > piece is blisteringly quick, so your check for existing objects runs in > nearly constant time. 100 is a parameter you can tweak, you could just save > every single time but that probably has overhead, if you make it much larger > than 100 you have the save overhead less often but you have to scan more > in-memory objects, it's a compromise. > 1000 checks and inserts a second seems .. about ok to me and if you make > sure and save the context regularly, you should be able to keep that rate up > even as the database size grows. > On 14-Feb-2010, at 5:51 AM, daniele malcom wrote: > > Hi Roland, in fact indices table exists (for DDArticle entity): > Enter SQL statements terminated with a ";" > sqlite> .tables > ZDDARTICLE Z_METADATA Z_PRIMARYKEY > sqlite> .indices ZDDARTICLE > ZDDARTICLE_ZMESSAGEID_INDEX > ZDDARTICLE_ZPARENT_INDEX > > With my macbook pro insertion of 30k articles took about 2/3 minutes. > I've uploaded a test project: > http://dl.dropbox.com/u/103260/CoreDataTreeTest.zip > I really don't know why it should take this long time but using > Instruments the big part is obviously fetch for searching id and > parent. > > On Sat, Feb 13, 2010 at 2:53 PM, Roland King <[email protected]> wrote: > > .. oh and one other thing, there's a core data instruments tool in XCode, > well there is for OSX, not for iPhoneOS which I develop for which may be why > I never saw it before. You could try that. > > On 13-Feb-2010, at 9:36 PM, Roland King wrote: > > ok, I don't see anything wrong with the predicate code, but I'm no core data > expert. > > I'll make one totally challengable statement. Assuming that core data uses > sqllite in a rational way to store objects (eg not storing everything as > blobs of opaque data) for instance one table per entity where each column of > the table is an attribute and evaluating the predicate does what you would > expect it to do, ie uses SQL to do as much of the heavy lifting on a fetch > request as possible, that column is indexed in the table and sqllite is > using the index; taking multi-minutes to find one row out of 20,000 just > doesn't make any sense, it should take seconds at most. > > I believe core data does use table-per-entity. I think that partly because > the documentation hints at it, partly because it makes sense and partly > because I looked at the implementation of one data model that I have. > > I can't see the point of making indexes if the predicate code doesn't > generate SQL which doesn't use them, but it's possible. It's possible that > core data goes and loads all the entity rows and inspects their attributes > by hand and filters them in code, but this is apple not microsoft. > > So that leaves column isn't indexed as the most likely. But you've checked > the 'indexed' box. Here's another wild assed guess, does coredata only > create a store when you have no current store? It certainly checks to see if > the store is compatible with the model but as the indexed property is just a > hint anyway, that store is compatible, just non-optimal .. it's possible if > you created the store with the property defined as not-indexed and have just > checked that box later, without regenerating the whole store, the index was > never added. Did you do that, just check it later? Have you regenerated a > complete new store since or are you using a store you've been populating for > a while. > > Here's a particularly ugly idea, purists please stop reading now. We can > look at the store and see if it has an index on that property ... first get > up a terminal window and go to the path where your store is. I'm assuming > you have sqlite3 installed like I do .. it came with the OS as far as I > know. > > Your store should be called something.sqlite, let's say it's Foo. Type > > sqlite3 Foo.sqlite > > and that should open the store and give you a prompt. First you want to find > the tables in the store, so type > > .tables > > as far as I can see they are called Z<YOUR ENTITY NAME>, so for you I'd > expect to see one of the tables called ZMCARTICLE. If there is one, you can > find out what indices are on it > > .indices ZMCARTICLE > > I believe again the indices are called Z<YOUR ENTITY NAME>_Z<YOUR ATTRIBUTE > NAME>_INDEX, so you'd expect to find ZMCARTICLE_ZMESSAGEID_INDEX in that > list. If you don't have it, the store wasn't created with that index. If > none of those tables exist at all, my rudimentary reverse engineering of the > whole coredata thing is flawed (or I'm using some entirely different version > from you). > > If the tables and indices exist, including the one on ZMESSAGEID, I'm out of > ideas unless someone knows of a way to put coredata into a form of debug > mode and see the SQL generated to figure out if it's doing anything smart. > > If either none of the above works or it does work but you don't have the > index, you have a couple of options. The right one is to delete your whole > message store and run your app and make a brand new one to see if that then > adds the indexed property with an index. Depending on how you've populated > the store, that might be a real pain, perhaps you can force a migration or > something. The other really stupid idea would be to just add the index and > hope that doesn't break everything entirely which is entirely possible at > which point you delete the store and start over. You would do that by > running > > CREATE INDEX ZMCARTICLE_ZMESSAGEID_INDEX ON ZMCARTICLE (ZMESSAGEID); > > Here's another useful thing I just came across, I would certainly run this > to see if the SQL being executed makes sense. > > > With Mac OS X version 10.4.3 and later, you can use the user default > com.apple.CoreData.SQLDebug to log to stderr the actual SQL sent to SQLite. > (Note that user default names are case sensitive.) For example, you can pass > the following as an argument to the application: > > -com.apple.CoreData.SQLDebug 1 > > Higher levels of debug numbers produce more information, although using > higher numbers is likely to be of diminishing utility. > > > > I'd love to hear about any other ways people have to debug coredata. I sort > of trust apple has done a good job with it and for it to break down > performance wise on looking for a row in 20,000 with a certain attribute > doesn't make sense to me. If you really can't get it to work, I'd write a > short project which inserts 20,000 simple objects into a store and another > one which opened the store and goes looking for one by attribute in the way > you have. If it takes multi-minutes, I'd sent it to apple as a bug. > > _______________________________________________ Cocoa-dev mailing list ([email protected]) Please do not post admin requests or moderator comments to the list. Contact the moderators at cocoa-dev-admins(at)lists.apple.com Help/Unsubscribe/Update your Subscription: http://lists.apple.com/mailman/options/cocoa-dev/archive%40mail-archive.com This email sent to [email protected]
