Re: Massive import in Django database

2014-06-12 Thread Erik Cederstrand
Den 11/06/2014 kl. 15.14 skrev John Carlo :

> Hello everybody,
> 
> I've fallen in love with Django two years ago and I've been using it for my 
> job projects. In the past I found very useful information in this group, so a 
> big thank you guys!
> 
> I have a little doubt.
> I have to import in Django db (sqlite for local development, mySql on the 
> server) about 1.000.000 xml documents.
> 
> The model class is the following:
> 
> class Doc(models.Model):
> doc_code =  models.CharField(max_length=20, unique=True, 
> primary_key=True, db_index = True) 
> doc_text = models.TextField(null=True, blank=True) 
> related_doc= models.ManyToManyField('self', null=True, blank=True, 
> db_index = True) 
> 
> From what I know bulk insertion is not possibile because I have a 
> ManyToManyField relation.

Actually, you *can* bulk insert. You just have to extract the m2m relation into 
an intermediate model 
(https://docs.djangoproject.com/en/dev/topics/db/models/#extra-fields-on-many-to-many-relationships).
 Bulk insert Doc instances first, then the related_doc relations. But if it's a 
one-time import job, then just start it Friday afternoon and skip the extra 
complexity.

Erik

-- 
You received this message because you are subscribed to the Google Groups 
"Django users" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to django-users+unsubscr...@googlegroups.com.
To post to this group, send email to django-users@googlegroups.com.
Visit this group at http://groups.google.com/group/django-users.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/django-users/4CFD77B4-D577-493B-B6E7-01219F5CF23D%40cederstrand.dk.
For more options, visit https://groups.google.com/d/optout.


Re: Re: Massive import in Django database

2014-06-11 Thread moqianc...@gmail.com
Hi, John:
Sorry! The pseudo code write by me is not correct, and It's slow..   I will 
come back tonight. 

With Regards,
Qiancong,Mo

From: moqianc...@gmail.com
Date: 2014-06-11 23:47
To: django-users
Subject: Re: Massive import in Django database
Hi, John:
I think your code is right, except "Doc.object" should be "Doc.objects";

The following pseudo code maybe fater than what you write:

doc_map = {}
for each xml:
extract from the xml data -> mydoc_code, mydoc_text, myRelated_doc_codes
doc = Doc.objects.create(doc_code=mydoc_code, doc_text=mydoc_text)
doc_map[mydoc_code] = (doc, myRelated_doc_codes)
for (doc, rcodes) in doc_map.values():
for rcode in rcodes:
doc.related_doc.add(doc_map[rcode])
doc.save()

I have checked, It's okay;
The object have be cached in doc_map, and no need re-query related_codes for 
related_doc from database,  the speed should speed up.

With Regards.




moqianc...@gmail.com

From: John Carlo
Date: 2014-06-11 21:14
To: django-users
Subject: Massive import in Django database
Hello everybody, 


I've fallen in love with Django two years ago and I've been using it for my job 
projects. In the past I found very useful information in this group, so a big 
thank you guys!


I have a little doubt.
I have to import in Django db (sqlite for local development, mySql on the 
server) about 1.000.000 xml documents.


The model class is the following:


class Doc(models.Model):
doc_code =  models.CharField(max_length=20, unique=True, primary_key=True, 
db_index = True) 

doc_text = models.TextField(null=True, blank=True) 
related_doc= models.ManyToManyField('self', null=True, blank=True, db_index 
= True) 



>From what I know bulk insertion is not possibile because I have a 
>ManyToManyField relation.


So I have this simple loop (in pseudo code)


for each xml:
   extract from the xml  date-> mydoc_code, mydoc_text, myRelated_doc_codes


   myDoc = Doc.object.get_or_create(doc_code = mydoc_code)[0]
   myDoc.doc_text = mydoc_text
   
   for reldoc_code in myRelated_doc_codes:
myRelDoc =  Doc.object.get_or_create(doc_code = reldoc_code )[0]
myDoc.related_doc.add(myRelDoc )


  myDoc.save()




I'm doing it right? Do you have some suggestions, recommendation? I fear that 
since I have 1.000.000 docs to import, it will take a lt of time, 
especially during the get_or_create routines


thank you in advance everybody!


John








 
-- 
You received this message because you are subscribed to the Google Groups 
"Django users" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to django-users+unsubscr...@googlegroups.com.
To post to this group, send email to django-users@googlegroups.com.
Visit this group at http://groups.google.com/group/django-users.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/django-users/5b88deaf-d806-4a64-9e8d-528d95599c80%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

-- 
You received this message because you are subscribed to the Google Groups 
"Django users" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to django-users+unsubscr...@googlegroups.com.
To post to this group, send email to django-users@googlegroups.com.
Visit this group at http://groups.google.com/group/django-users.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/django-users/2014061208110509465878%40gmail.com.
For more options, visit https://groups.google.com/d/optout.


Re: Massive import in Django database

2014-06-11 Thread moqianc...@gmail.com
Hi, John:
I think your code is right, except "Doc.object" should be "Doc.objects";

The following pseudo code maybe fater than what you write:

doc_map = {}
for each xml:
extract from the xml data -> mydoc_code, mydoc_text, myRelated_doc_codes
doc = Doc.objects.create(doc_code=mydoc_code, doc_text=mydoc_text)
doc_map[mydoc_code] = (doc, myRelated_doc_codes)
for (doc, rcodes) in doc_map.values():
for rcode in rcodes:
doc.related_doc.add(doc_map[rcode])
doc.save()

I have checked, It's okay;
The object have be cached in doc_map, and no need re-query related_codes for 
related_doc from database,  the speed should speed up.

With Regards.




moqianc...@gmail.com

From: John Carlo
Date: 2014-06-11 21:14
To: django-users
Subject: Massive import in Django database
Hello everybody,


I've fallen in love with Django two years ago and I've been using it for my job 
projects. In the past I found very useful information in this group, so a big 
thank you guys!


I have a little doubt.
I have to import in Django db (sqlite for local development, mySql on the 
server) about 1.000.000 xml documents.


The model class is the following:


class Doc(models.Model):
doc_code =  models.CharField(max_length=20, unique=True, primary_key=True, 
db_index = True) 

doc_text = models.TextField(null=True, blank=True) 
related_doc= models.ManyToManyField('self', null=True, blank=True, db_index 
= True) 



>From what I know bulk insertion is not possibile because I have a 
>ManyToManyField relation.


So I have this simple loop (in pseudo code)


for each xml:
   extract from the xml  date-> mydoc_code, mydoc_text, myRelated_doc_codes


   myDoc = Doc.object.get_or_create(doc_code = mydoc_code)[0]
   myDoc.doc_text = mydoc_text
   
   for reldoc_code in myRelated_doc_codes:
myRelDoc =  Doc.object.get_or_create(doc_code = reldoc_code )[0]
myDoc.related_doc.add(myRelDoc )


  myDoc.save()




I'm doing it right? Do you have some suggestions, recommendation? I fear that 
since I have 1.000.000 docs to import, it will take a lt of time, 
especially during the get_or_create routines


thank you in advance everybody!


John








 
-- 
You received this message because you are subscribed to the Google Groups 
"Django users" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to django-users+unsubscr...@googlegroups.com.
To post to this group, send email to django-users@googlegroups.com.
Visit this group at http://groups.google.com/group/django-users.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/django-users/5b88deaf-d806-4a64-9e8d-528d95599c80%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

-- 
You received this message because you are subscribed to the Google Groups 
"Django users" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to django-users+unsubscr...@googlegroups.com.
To post to this group, send email to django-users@googlegroups.com.
Visit this group at http://groups.google.com/group/django-users.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/django-users/2014061123474049956470%40gmail.com.
For more options, visit https://groups.google.com/d/optout.