Hi ecognium, On Tue, Jun 23, 2009 at 6:11 PM, ecognium <[email protected]> wrote:
> > Thanks again - this is very helpful. I will let you know if i run into > any future index creation errors as it could have been caused by any > number of other entries - i mistakenly thought it was all these categ > list-based entries. > > So if i understand it right even with a 10 element list for keywords, > there will only be 10 rows when 4 categ fields are used. Correct - 4C4 * 10C1 index entries for the custom index you specified earlier. > In the event > I use 'categ' only once in my query along with keywords field, it > will have up to 40 rows (10 from keywords and 4C1 from categ list). Correct. > Am > I adding these up right? > > I do not see myself going beyond 6 elements in the categ list at this > point (I guess the max will be 6C3 = 20 under such a situation). In the case of your index with 4 occurrences, this will rather be 6C4. > The > keyword list will be probably go into the 20s but do not see anything > beyond that and will always be used only once in the query. > > Thanks, > -e > > On Jun 23, 3:53 am, "Nick Johnson (Google)" <[email protected]> > wrote: > > Hi ecognium, > > > > > > > > On Tue, Jun 23, 2009 at 1:35 AM, ecognium <[email protected]> wrote: > > > > > Thanks, Nick. Let me make sure I understand your comment correctly. > > > Suppose I have the following data: > > > > > ID BlobProp1 BlobProp2-N Keywords > > > Categ > > > ================================================= > > > 123 blah blah tag1,tag2,tag3 > > > Circle, > > > Red, Large, Dotted > > > 345 blah blah tag3,tag4,tag5 > > > Square, Blue, Small, Solid > > > 678 blah blah tag1,tag3,tag4 > > > Circle, Blue, Small, Solid > > > > > > ------------------------------------------------------------------------------------------------------------- > > > > > The field categ (list) contains four different types - Shape, Color, > > > Size and Line Type. Suppose the user wants to retrieve all entities > > > that are Small Dotted Blue Circles then the query will be: > > > > > Select * From MyModel where categ = "Circle" AND categ = "Small" AND > > > categ = "Blue" AND categ = "Dotted" > > > > > When I was reading about exploding indexes the example indicated the > > > issue was due to Cartesian product of two list elements. I thought the > > > same will hold true with one list field when used multiple times in a > > > query. > > > > That is indeed true, though it's not quite the cartesian product - the > > datastore won't bother indexing (Circle, Circle, Circle, Circle), or > > (Dotted, Dotted, Dotted, Dotted) - it only indexes every unique > combination, > > which is a substantially smaller number than the cartesian product. It's > > still only tractable for small lists, though, such as the 4 item lists > > you're dealing with. > > > > Are you saying the above query will not need {Circle, Red, > > > > > Large, Dotted} * {Circle, , , } * {Circle, , , } * {Circle, , , } > > > number of index entities for entity ID=123? > > > > Correct - if you're not specifying a sort order, you can execute the > query > > without any composite indexes whatsoever. The datastore satisfies > > equality-only queries using a merge join strategy. > > > > > I was getting index errors > > > when I was using the categ list property four times in my index > > > specification and that's why I was wondering if I should restructure > > > things. > > > > How many items did you have in the list you were indexing in that case? > If > > your list has 4 items and your index specification lists it 4 times, you > > should only get one index entry. > > > > so I am guessing the following spec should not cause any index > > > > > issues in the future? > > > > Again, that depends on the number of entries in the 'categ' list. With 4 > > entries, this will only generate a single index entry, but the number of > > entries will expand exponentially as the list increases in size. > > > > -Nick Johnson > > > > > > > > > > > > > - kind: MyModel > > > properties: > > > - name: categ > > > - name: categ > > > - name: categ > > > - name: categ > > > - name: keywords > > > - name: __key__ # used for paging > > > > > Thanks, > > > -e > > > > > On Jun 22, 2:10 am, "Nick Johnson (Google)" <[email protected]> > > > wrote: > > > > Hi ecognium, > > > > > > If I understand your problem correctly, every entity will have 0-4 > > > entries > > > > in the 'categ' list, corresponding to the values for each of 4 > categories > > > > (eg, Color, Size, Shape, etc)? > > > > > > The sample query you give, with only equality filters, will be > > > satisfiable > > > > using the merge join query planner, which doesn't require custom > indexes, > > > so > > > > you won't have high indexing overhead. There will simply be one index > > > entry > > > > for each item in each list. > > > > > > If you do need custom indexes, the number of index entries, isn't > 4^4, as > > > > you suggest, but rather smaller. Assuming you want to be able to > query > > > with > > > > any number of categories from 0 to 4, you'll need 3 or 4 custom > indexes > > > > (depending on if the 0-category case requires its own index), and the > > > total > > > > number of index entries will be 4C1 + 4C2 + 4C3 + 4C4 = 4 + 6 + 4 + 1 > = > > > 15. > > > > For 6 categories, the number of entries would be 6 + 15 + 20 + 15 + 6 > + 1 > > > = > > > > 63, which is still a not-unreasonable number. > > > > > > -Nick Johnson > > > > > > On Mon, Jun 22, 2009 at 8:51 AM, ecognium <[email protected]> > wrote: > > > > > > > Hi All, > > > > > > > I would like to get your opinion on the best way to structure my > > > > > data model. > > > > > My app allows the users to filter the entities by four category > types > > > > > (say A,B,C,D). Each category can have multiple values (for e.g., > > > > > category type A can have values 1,2,3) but the > > > > > user can choose only one value per category for filtering. Please > > > > > note the values are unique across the category types as well. I > could > > > > > create four fields corresponding to the four types but it does not > > > > > allow me to expand to more categories later easily. Right now, I > just > > > > > use one list field to store the different values as it is easy to > add > > > > > more category types later on. > > > > > > > My model (simplified) looks like this: > > > > > > > class Example(db.Model): > > > > > > > categ = db.StringListProperty() > > > > > > > keywords = db.StringListProperty() > > > > > > > The field keywords will have about 10-20 values for each entity. > For > > > > > the above example, categ will have up to 4 values. Since I allow > for > > > > > filtering on 4 category types, the index table gets large with > > > > > unnecessary values. The filtering logic looks like: > > > > > keyword = 'k' AND categ = '1' AND categ = '9' AND categ = '14' AND > > > > > categ = '99' > > > > > > > Since there are 4 values in the categ list property, there will be > > > > > 4^4 rows created in the index table (most of them will never be hit > > > > > due to the uniqueness guaranteed by design). Multiply it by the > number > > > > > of values in the keywords table, the index table gets large very > > > > > quickly. > > > > > > > I would like to avoid creating multiple fields if possible because > > > > > when I want to make the number of category types to six, I would > have > > > > > to change the underlying model and all the filtering code. Any > > > > > suggestions on how to construct the model such that it will allow > for > > > > > ease of expansion in category types yet still not create large > index > > > > > tables? I know there is a Category Property but not sure if it > really > > > > > provides any specific benefit here. > > > > > > > Thanks! > > > > > -e > > > > > > -- > > > > Nick Johnson, App Engine Developer Programs Engineer > > > > Google Ireland Ltd. :: Registered in Dublin, Ireland, Registration > > > Number: > > > > 368047 > > > > -- > > Nick Johnson, App Engine Developer Programs Engineer > > Google Ireland Ltd. :: Registered in Dublin, Ireland, Registration > Number: > > 368047 > > > -- Nick Johnson, App Engine Developer Programs Engineer Google Ireland Ltd. :: Registered in Dublin, Ireland, Registration Number: 368047 --~--~---------~--~----~------------~-------~--~----~ You received this message because you are subscribed to the Google Groups "Google App Engine" group. To post to this group, send email to [email protected] To unsubscribe from this group, send email to [email protected] For more options, visit this group at http://groups.google.com/group/google-appengine?hl=en -~----------~----~----~----~------~----~------~--~---
