Re: bulk add m2m relationship for multiple instances

2015-09-08 Thread yakkadesign
I reviewed my code and the slow speed I was talking about was when I said 
|if a in myList].  I switched to the dict and I haven't noticed a 
performance hit from keeping a pointer to a list.  

I ended up switching to Postgres COPY for the importing.  It's a lot 
faster.  

Brian

  


-- 
You received this message because you are subscribed to the Google Groups 
"Django users" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to django-users+unsubscr...@googlegroups.com.
To post to this group, send email to django-users@googlegroups.com.
Visit this group at http://groups.google.com/group/django-users.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/django-users/8c809a31-a71d-46ac-8713-8d82c780bcfa%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.


Re: bulk add m2m relationship for multiple instances

2015-08-13 Thread Erik Cederstrand

> Den 13/08/2015 kl. 04.09 skrev yakkades...@gmail.com:
> 
> I'll run a test with the dict vs list+position counter. I know I saw a speed 
> improvement but I can't remember if that was the only thing I changed. 
> 
> I'd have to change a lot of code if I change the DB scheme so I'm not wanting 
> to create an intermediate table. I'm going to go down the SQL path.

The intermediate model doesn't change the DB schema in your case. A 
models.ManyToManyField already implicitly creates a table in the DB to hold the 
m2m relation. The intermediate model just makes this explicit.

The only thing this changes in your code is that you can't do 
"my_datapoint.sensors.add(my_sensor)" anymore. You need to always create (and 
delete) a DatapointSensorRel explicitly.

Erik

-- 
You received this message because you are subscribed to the Google Groups 
"Django users" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to django-users+unsubscr...@googlegroups.com.
To post to this group, send email to django-users@googlegroups.com.
Visit this group at http://groups.google.com/group/django-users.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/django-users/ABFA88FA-3D1F-42F8-A634-9E04C0C6DCD3%40cederstrand.dk.
For more options, visit https://groups.google.com/d/optout.


Re: bulk add m2m relationship for multiple instances

2015-08-13 Thread Derek
This Python wiki 
(https://wiki.python.org/moin/PythonSpeed/PerformanceTips#Choose_the_Right_Data_Structure)
 
suggests:

* Membership testing with sets and dictionaries is much faster, O(1), than 
searching sequences, O(n). When testing "a in b", b should be a set or 
dictionary instead of a list or tuple.

This may or may not apply to your use case.   There is also more discussion 
in this thread:
http://bytes.com/topic/python/answers/587277-how-much-slower-dict-indexing-vs-list-indexing

I have also found in some of my own cases (not involving m2m models though) 
that dropping down to raw SQL for bulk uploads is fast enough to justify 
doing it; but I am trading off against all the model checks/balances that 
Django provides.

On Thursday, 13 August 2015 04:09:04 UTC+2, yakka...@gmail.com wrote:
>
> I'll run a test with the dict vs list+position counter. I know I saw a 
> speed improvement but I can't remember if that was the only thing I 
> changed. 
>
>
> I'd have to change a lot of code if I change the DB scheme so I'm not 
> wanting to create an intermediate table. I'm going to go down the SQL path. 
> Let me know if you have any suggestions. I'm still a beginner with SQL. 
>
>
> Thanks for your help!!!
>
>
> Brian
>
>
>
>

-- 
You received this message because you are subscribed to the Google Groups 
"Django users" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to django-users+unsubscr...@googlegroups.com.
To post to this group, send email to django-users@googlegroups.com.
Visit this group at http://groups.google.com/group/django-users.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/django-users/4dab3118-8317-43ea-82a9-e425d4218b1b%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.


Re: bulk add m2m relationship for multiple instances

2015-08-12 Thread yakkadesign
 

I'll run a test with the dict vs list+position counter. I know I saw a 
speed improvement but I can't remember if that was the only thing I 
changed. 


I'd have to change a lot of code if I change the DB scheme so I'm not 
wanting to create an intermediate table. I'm going to go down the SQL path. 
Let me know if you have any suggestions. I'm still a beginner with SQL. 


Thanks for your help!!!


Brian



-- 
You received this message because you are subscribed to the Google Groups 
"Django users" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to django-users+unsubscr...@googlegroups.com.
To post to this group, send email to django-users@googlegroups.com.
Visit this group at http://groups.google.com/group/django-users.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/django-users/92d75f9f-b406-4f53-ba52-773d50fe6564%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.


Re: bulk add m2m relationship for multiple instances

2015-08-12 Thread Erik Cederstrand

> Den 12/08/2015 kl. 20.00 skrev yakkades...@gmail.com:
> 
> In the actually code I create and preload all the DataPoints and Sensors 
> outside the loop.  I found a dict was too slow for DataPoints.

That's suspicious. Compared to loading data from the database, Python dicts are 
not slow, for any reasonable value of slow.

> Can you explain what you mean by “If you need better bulk-insert performance 
> than this, you can convert the m2m relation between Sensor and DataPoint to 
> an explicit m2m model. You can then bulk-insert all m2m relations in one go 
> instead of per-object”?   I’m not sure how to implement this.

Create an intermediate model for the m2m relation as described in 
https://docs.djangoproject.com/en/1.8/topics/db/models/#extra-fields-on-many-to-many-relationships

You can then bulk_create() on this model instead of using add(). Something like:


  class Datapoint(models.Model):
  sensors = models.ManyToManyField(Sensor, through='DatapointSensorRel')


  class DatapointSensorRel(models.Model):
  datapoint = models.ForeignKey(Datapoint)
  sensor = models.ForeignKey(Sensor)


Used like this:

  relations = [DatapointSensorRel(datapoint=d, sensor=s) for d, s in 
my_collected_relations]
  DatapointSensorRel.objects.bulk_create(relations)


Erik

-- 
You received this message because you are subscribed to the Google Groups 
"Django users" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to django-users+unsubscr...@googlegroups.com.
To post to this group, send email to django-users@googlegroups.com.
Visit this group at http://groups.google.com/group/django-users.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/django-users/E9071BDE-2AD3-4D77-8D9F-0ABD1791CDDB%40cederstrand.dk.
For more options, visit https://groups.google.com/d/optout.


Re: bulk add m2m relationship for multiple instances

2015-08-12 Thread yakkadesign
Hi Erik,

In the actually code I create and preload all the DataPoints and Sensors 
outside the loop.  I found a dict was too slow for DataPoints.  I ended up 
sorting the DataPoints query by date and using the fact that they were in 
the same order as the CSV to speed things up.  

Looping through the datapoinst and the dp.Sensors.add line is too slow.  

Can you explain what you mean by “If you need better bulk-insert 
performance than this, you can convert the m2m relation between Sensor and 
DataPoint to an explicit m2m model. You can then bulk-insert all m2m 
relations in one go instead of per-object”?   I’m not sure how to implement 
this.  

Here is what the data will look like
time  sensorA  sensorB
___
1  45
2  64
3  92
4  37

The problem is that I'm calling DataPoint.add at lot.  It seems like there 
should be a more efficient want to add them instead of looping though each 
datapoint and calling .add.  I'd like to do something like bulk_create.

Brian

-- 
You received this message because you are subscribed to the Google Groups 
"Django users" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to django-users+unsubscr...@googlegroups.com.
To post to this group, send email to django-users@googlegroups.com.
Visit this group at http://groups.google.com/group/django-users.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/django-users/a08acbf0-6758-4187-89fc-f0c1f07a6a3c%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.


Re: bulk add m2m relationship for multiple instances

2015-08-12 Thread Erik Cederstrand
> Den 12/08/2015 kl. 04.47 skrev yakkades...@gmail.com:
> 
> for row in rows:
> dp = DataPoint.objects.get(Taken_datetime=row['date']) 
> 
> sensorToAdd = []
> for sensor in sensors:
> s = Sensor.objects.get(Name=sensor.name, Value=sensor.value  )
> sensorToAdd.append( s )
> 
> dp.Sensors.add( sensorToAdd )
> 
> In the actually app I bulk create all the DataPoint and Sensor instances.  
> The problem is that |dp.Sensors.add( sensorToAdd )| does a lot of hits on the 
> db.  I want a way bulk add all the sensors.  

Try fetching all the data you need from the database up-front and place objects 
in a dict. Depending on the volume of your data, you may want to split this 
into reasonable batches:

  from django.db.models import Q

  datapoints = DataPoint.objects.filter(Taken_datetime__in={r['date'] for r in 
rows})
  datapoints_map = {(d.Taken_datetime, d) for d in datapoints}

  # To generate efficient SQL, make sure (name, value) pairs are unique
  unique_sensor_values = {(s.name, s.value) for s in my_list_of_sensors}
  sensors_q = Q()
  for name, value in my_unique_sensor_values:
  sensors |= Q(Name=name, Value=value) 
  sensors = Sensor.objects.filter(sensors_q)
  sensors_map = {((s.name, s.value), s) for s in sensors}

This reduces your queries to only two. You can then bulk-insert m2m relations 
per-object like this:

   for some_date, some_sensors in my_data:
   dp = datapoints_map[some_date]
   dp.Sensors.add(*[sensors_map[(s.name, s.value)] for s in some_sensors])


If you need better bulk-insert performance than this, you can convert the m2m 
relation between Sensor and DataPoint to an explicit m2m model. You can then 
bulk-insert all m2m relations in one go instead of per-object.

You should probably add indexes on DataPoint.Taken_datetime and Sensors.[Name, 
Value] to increase query performance.

Erik

-- 
You received this message because you are subscribed to the Google Groups 
"Django users" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to django-users+unsubscr...@googlegroups.com.
To post to this group, send email to django-users@googlegroups.com.
Visit this group at http://groups.google.com/group/django-users.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/django-users/7FCE3968-7C15-4D92-8DB3-E4A6EE56442B%40cederstrand.dk.
For more options, visit https://groups.google.com/d/optout.