Re: Proposal on Custom Indexes - Google Summer of Code

Josh Smeaton Tue, 08 Mar 2016 18:20:42 -0800

Hi Akshesh,

The proposal looks really good so far. I haven't gone through it in depth 
though, but I'll set aside some time to try and do so. A few comments:


- Using the schema editor rather than sqlcompiler is the right choice. 

- The earlier part of your proposal seems to focus on allowing postgres 
users to define more complex indexes that they previously haven't been able 
to use. Oracle is then highlighted also for allowing functional indexes. I 
think one of the biggest "why's" of this project is to allow users to 
define custom indexes. It doesn't matter what kind of index. They get 
access to the stuff that creates them. That should be highlighted before 
all else.

- Should the type of index (BTree, Spatial, etc) be an index type in and of 
itself, or should it be a property of a specific index? My knowledge here 
is limited, so I'm not sure if there are certain constraints on which kind 
of indexes can use spatial/btree/partial storage types. If certain index 
types are very heavily restricted, then it probably makes sense to 
highlight them (as you've done) as a top level class rather than an 
attribute of indexes in general. If most indexes can be arranged with 
different storage types though, it'd probably make more sense to expose 
that functionality as an attribute of all indexes.

- "Index() subclasses will also have a supports(backend) method which would 
return True or False depending on whether the backend supports that index 
type." Backends (operations.py) already supports 
check_expression_support(self, expression). You can bake index support into 
these methods from within each backend. Especially helpful for third party 
backends, as they won't need to monkey patch the index classes. The only 
issue I can see with reusing check_expression_support is the overhead of 
all other expressions being checked. I don't think that's a big concern, 
but it's worth being conscious of.

- "Index classes will take 3 arguments fields, model and name - the later 
two being optional.". Index classes should accept a list of expressions, 
not just field names. This will allow trivial support of functional indexes 
without a specific functional index class. You'll also benefit from 
existing expression subclasses which process field names as F() 
expressions. F() expressions may have to be changed (or a new concept 
introduced) which can resolve the field, but doesn't try to do any joining.

fullname_index=Index(ToLower('first_name'), ToLower('last_name')) -> create 
index <table_name>_fullname_index on <table_name>(Lower(first_name), 
Lower(last_name));

Once the index is added to a model, the model can inject itself into the 
index, or the schema editor can use the originating model to get data it 
needs. No need to pass the model into the index type I think. Regarding the 
name, I'm unsure if I prefer a name=Index() or an Index(name='') approach. 
It'd be good to see that choice in your proposal with a couple of pros and 
cons.


Ok, onto the questions you actually asked.

1) I don't think it's necessary to have an IndexTogether type. 
Index('field_a', 'field_b') should be sufficient I think. If it's not 
sufficient, call out why. I'm also in favour of translating any 
index_together definitions into appropriate Index types. Then SchemaEditor 
only needs to work with Index expressions and things remain simpler at the 
boundary between model definition and migrations.

2) Again I think it's a good idea to internally create index classes based 
on older (unique=True) definitions. Unique may not have to have it's own 
index type as it could (maybe) be a property of an Index. Again, I'll leave 
that choice up to you, but it'd be good to see the pros/cons briefly 
discussed.

Again, really good proposal!

On Wednesday, 9 March 2016 01:46:02 UTC+11, akki wrote:
>
> Hi
>
> My name is Akshesh Doshi (akki). I am a student at Indian Institute Of 
> Technology, Roorkee (IITR). I have been contributing to Django 
> <https://github.com/django/django/pulls/akki> for quite some time now and 
> my experience has been really great till now. I found the community to be 
> very welcoming and have learnt a lot in this period of time.
>
> With this spirit I would like to work on the idea of "Custom Indexes" 
> <https://code.djangoproject.com/wiki/SummerOfCode2016#Customindexes>  and 
> extend it as my proposal for Google Summer of Code 2016 
> <https://summerofcode.withgoogle.com/>.
>
> I have started preparing my proposal and here is the initial draft 
> <https://gist.github.com/akki/b438292c2c3cf199012f> of it. I would like 
> to hear your thoughts regarding this. Also I wanted to discuss some points 
> mentioned below. The timeline still needs work as I am still digging into 
> the code of expressions to see how we can use them in FunctionalIndex. Any 
> pointers/thoughts on that would be appreciated.
>
> Key points:
>   - Introduction to classed based indexes.
>   - Support for custom classes to db_index.
>   - Introduction of Meta.indexes.
>   - Allowing fields to specify their own index type.
>   - Support for a indexes/constraints API. 
>   - Extend expressions into indexes.
>   - Bring spatial indexes under the hood.
>
> Points I would like to discuss:
>  1) Would it be right to *create a IndexTogether class* (subclass of 
> Index) and internally translate Meta.index_together to it ?
>       This would allow something like -
>           class Meta:
>               indexes = [IndexTogether(['field1', 'field2'])]
>     I think this would let us keep a better track of any indexes 
> (including those by `index_together`) that are being created. 
> `Meta.index_together` would be internally translated to Meta.indexes. This 
> might also be followed by deprecation of `index_together` if we want.
>
>  2) 
> *Handling Unique constraints via indexes.*      For handling of 
> constraints, I was thinking of creating a UniqueIndex class which would 
> handle any unique constraint on any column and allow other options like to 
> make it deferrable.
>       Right now fields with ``unique=True`` apply the unique constraints 
> by using both the UNIQUE constraint in the CREATE TABLE statement and by 
> creating an index for it. For example, in this model 
> <https://gist.github.com/akki/56b6c3cac56073c9bf4d>, multiple (repeating) 
> indexes are being generated for the `name` field (one by UNIQUE constraint, 
> 2 others manually, on postgresql). This takes more space and is also not 
> good performancewise. This situation can also be mitigated by keeping a 
> track of all unique constraints at only one place.
>       So I was thinking of bringing the unique constraint totally under 
> indexes. Maybe some `models.UniqueIndex()` could be used in meta.indexes to 
> add constraints ? Any thoughts on it ?
>
>
> I had also prepared a DEP (based on an earlier DEP by Marc Tamlyn), which 
> is now a subset of this proposal.
>
> Regards
> Akshesh Doshi
> (akki)
>

-- 
You received this message because you are subscribed to the Google Groups 
"Django developers  (Contributions to Django itself)" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to django-developers+unsubscr...@googlegroups.com.
To post to this group, send email to django-developers@googlegroups.com.
Visit this group at https://groups.google.com/group/django-developers.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/django-developers/c2245bd3-b05d-435c-b8df-bbe652a5f63b%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Re: Proposal on Custom Indexes - Google Summer of Code

Reply via email to