Re: Massive import in Django database
Den 11/06/2014 kl. 15.14 skrev John Carlo: > Hello everybody, > > I've fallen in love with Django two years ago and I've been using it for my > job projects. In the past I found very useful information in this group, so a > big thank you guys! > > I have a little doubt. > I have to import in Django db (sqlite for local development, mySql on the > server) about 1.000.000 xml documents. > > The model class is the following: > > class Doc(models.Model): > doc_code = models.CharField(max_length=20, unique=True, > primary_key=True, db_index = True) > doc_text = models.TextField(null=True, blank=True) > related_doc= models.ManyToManyField('self', null=True, blank=True, > db_index = True) > > From what I know bulk insertion is not possibile because I have a > ManyToManyField relation. Actually, you *can* bulk insert. You just have to extract the m2m relation into an intermediate model (https://docs.djangoproject.com/en/dev/topics/db/models/#extra-fields-on-many-to-many-relationships). Bulk insert Doc instances first, then the related_doc relations. But if it's a one-time import job, then just start it Friday afternoon and skip the extra complexity. Erik -- You received this message because you are subscribed to the Google Groups "Django users" group. To unsubscribe from this group and stop receiving emails from it, send an email to django-users+unsubscr...@googlegroups.com. To post to this group, send email to django-users@googlegroups.com. Visit this group at http://groups.google.com/group/django-users. To view this discussion on the web visit https://groups.google.com/d/msgid/django-users/4CFD77B4-D577-493B-B6E7-01219F5CF23D%40cederstrand.dk. For more options, visit https://groups.google.com/d/optout.
Re: Re: Massive import in Django database
Hi, John: Sorry! The pseudo code write by me is not correct, and It's slow.. I will come back tonight. With Regards, Qiancong,Mo From: moqianc...@gmail.com Date: 2014-06-11 23:47 To: django-users Subject: Re: Massive import in Django database Hi, John: I think your code is right, except "Doc.object" should be "Doc.objects"; The following pseudo code maybe fater than what you write: doc_map = {} for each xml: extract from the xml data -> mydoc_code, mydoc_text, myRelated_doc_codes doc = Doc.objects.create(doc_code=mydoc_code, doc_text=mydoc_text) doc_map[mydoc_code] = (doc, myRelated_doc_codes) for (doc, rcodes) in doc_map.values(): for rcode in rcodes: doc.related_doc.add(doc_map[rcode]) doc.save() I have checked, It's okay; The object have be cached in doc_map, and no need re-query related_codes for related_doc from database, the speed should speed up. With Regards. moqianc...@gmail.com From: John Carlo Date: 2014-06-11 21:14 To: django-users Subject: Massive import in Django database Hello everybody, I've fallen in love with Django two years ago and I've been using it for my job projects. In the past I found very useful information in this group, so a big thank you guys! I have a little doubt. I have to import in Django db (sqlite for local development, mySql on the server) about 1.000.000 xml documents. The model class is the following: class Doc(models.Model): doc_code = models.CharField(max_length=20, unique=True, primary_key=True, db_index = True) doc_text = models.TextField(null=True, blank=True) related_doc= models.ManyToManyField('self', null=True, blank=True, db_index = True) >From what I know bulk insertion is not possibile because I have a >ManyToManyField relation. So I have this simple loop (in pseudo code) for each xml: extract from the xml date-> mydoc_code, mydoc_text, myRelated_doc_codes myDoc = Doc.object.get_or_create(doc_code = mydoc_code)[0] myDoc.doc_text = mydoc_text for reldoc_code in myRelated_doc_codes: myRelDoc = Doc.object.get_or_create(doc_code = reldoc_code )[0] myDoc.related_doc.add(myRelDoc ) myDoc.save() I'm doing it right? Do you have some suggestions, recommendation? I fear that since I have 1.000.000 docs to import, it will take a lt of time, especially during the get_or_create routines thank you in advance everybody! John -- You received this message because you are subscribed to the Google Groups "Django users" group. To unsubscribe from this group and stop receiving emails from it, send an email to django-users+unsubscr...@googlegroups.com. To post to this group, send email to django-users@googlegroups.com. Visit this group at http://groups.google.com/group/django-users. To view this discussion on the web visit https://groups.google.com/d/msgid/django-users/5b88deaf-d806-4a64-9e8d-528d95599c80%40googlegroups.com. For more options, visit https://groups.google.com/d/optout. -- You received this message because you are subscribed to the Google Groups "Django users" group. To unsubscribe from this group and stop receiving emails from it, send an email to django-users+unsubscr...@googlegroups.com. To post to this group, send email to django-users@googlegroups.com. Visit this group at http://groups.google.com/group/django-users. To view this discussion on the web visit https://groups.google.com/d/msgid/django-users/2014061208110509465878%40gmail.com. For more options, visit https://groups.google.com/d/optout.
Re: Massive import in Django database
Hi, John: I think your code is right, except "Doc.object" should be "Doc.objects"; The following pseudo code maybe fater than what you write: doc_map = {} for each xml: extract from the xml data -> mydoc_code, mydoc_text, myRelated_doc_codes doc = Doc.objects.create(doc_code=mydoc_code, doc_text=mydoc_text) doc_map[mydoc_code] = (doc, myRelated_doc_codes) for (doc, rcodes) in doc_map.values(): for rcode in rcodes: doc.related_doc.add(doc_map[rcode]) doc.save() I have checked, It's okay; The object have be cached in doc_map, and no need re-query related_codes for related_doc from database, the speed should speed up. With Regards. moqianc...@gmail.com From: John Carlo Date: 2014-06-11 21:14 To: django-users Subject: Massive import in Django database Hello everybody, I've fallen in love with Django two years ago and I've been using it for my job projects. In the past I found very useful information in this group, so a big thank you guys! I have a little doubt. I have to import in Django db (sqlite for local development, mySql on the server) about 1.000.000 xml documents. The model class is the following: class Doc(models.Model): doc_code = models.CharField(max_length=20, unique=True, primary_key=True, db_index = True) doc_text = models.TextField(null=True, blank=True) related_doc= models.ManyToManyField('self', null=True, blank=True, db_index = True) >From what I know bulk insertion is not possibile because I have a >ManyToManyField relation. So I have this simple loop (in pseudo code) for each xml: extract from the xml date-> mydoc_code, mydoc_text, myRelated_doc_codes myDoc = Doc.object.get_or_create(doc_code = mydoc_code)[0] myDoc.doc_text = mydoc_text for reldoc_code in myRelated_doc_codes: myRelDoc = Doc.object.get_or_create(doc_code = reldoc_code )[0] myDoc.related_doc.add(myRelDoc ) myDoc.save() I'm doing it right? Do you have some suggestions, recommendation? I fear that since I have 1.000.000 docs to import, it will take a lt of time, especially during the get_or_create routines thank you in advance everybody! John -- You received this message because you are subscribed to the Google Groups "Django users" group. To unsubscribe from this group and stop receiving emails from it, send an email to django-users+unsubscr...@googlegroups.com. To post to this group, send email to django-users@googlegroups.com. Visit this group at http://groups.google.com/group/django-users. To view this discussion on the web visit https://groups.google.com/d/msgid/django-users/5b88deaf-d806-4a64-9e8d-528d95599c80%40googlegroups.com. For more options, visit https://groups.google.com/d/optout. -- You received this message because you are subscribed to the Google Groups "Django users" group. To unsubscribe from this group and stop receiving emails from it, send an email to django-users+unsubscr...@googlegroups.com. To post to this group, send email to django-users@googlegroups.com. Visit this group at http://groups.google.com/group/django-users. To view this discussion on the web visit https://groups.google.com/d/msgid/django-users/2014061123474049956470%40gmail.com. For more options, visit https://groups.google.com/d/optout.