There are two cases where the code is attempting to use indexes: 1) When deleting a type, find and delete the anonymous subtypes. 2) When deleting a type, confirm that it is not used as a nested type of another type.
Ignoring the "indexes" that we have in Metadata, Datatype records have a field called "Fields" which contains a list of the fields within the type. Each value in this list has a "fieldname" and "fieldtype." For 1 we can simply iterate through this list and call delete recursively when "fieldtype" is not primitive and anonymous For 2 we need some way to find parent types given a type. The only way to do this quickly would be to create an index on Fields.fieldtype which is a field within a record within a list. Steven On Monday, December 14, 2015, Mike Carey <[email protected]> wrote: > Can you briefly explain why option 3 is so heavy? (Remind us how the use > info is modeled?) > > On 12/14/15 3:43 PM, Steven Jacobs wrote: > >> We just had a UCR discussion on this topic. The issue is really with the >> third "index" here. The code now is using one "index" to go in two >> directions: >> 1) To find datatypes that use datatype A >> 2) To find datatypes that are used by datatype A. >> >> The way that it works now is hacked together, but designed for >> performance. >> So we have three choices here: >> >> 1) Stick to the status quo, and leave the "indexes" as they are >> 2) Remove the Metadata secondary indexes, which will eliminate the hack >> but >> cost some performance on Metadata >> 3) Implement the Metadata secondary indexes correctly as Asterix indexes. >> For this solution to work with our dataset designs, we will need to have >> the ability to index homogeneous lists. In addition, we will have reverse >> compatibility issues unless we plan things out for the transition. >> >> What are the thoughts? >> >> >> Orthogonally, it seems that the consensus for storing the datatype >> dataverse in the dataset Metadata is to just add it as an open field at >> least for now. Is that correct? >> >> Steven >> >> >> On Mon, Dec 14, 2015 at 1:23 PM, Mike Carey <[email protected]> wrote: >> >> Thoughts inlined: >>> >>> On 12/14/15 11:12 AM, Steven Jacobs wrote: >>> >>> Here are the conclusions that Ildar and I have drawn from looking at the >>>> secondary indexes: >>>> >>>> First of all it seems that datasets are local to node groups, but >>>> dataverses can span node groups, which seems a little odd to me. >>>> >>>> Node groups are an undocumented but to-be-exploited-someday feature that >>> allows datasets to be stored on less than all nodes in a given cluster. >>> As >>> we face bigger clusters, we'll want to open up that possibility. We will >>> hopefully use them inside w/o having to make users manage them manually >>> like parallel DB2 did/does. Dataverses are really just a namespace >>> thing, >>> not a storage thing at all, so they are orthogonal to (and unrelated to) >>> node groups. >>> >>> There are three Metadata secondary indexes: GROUPNAME_ON_DATASET_INDEX, >>>> DATATYPENAME_ON_DATASET_INDEX, DATATYPENAME_ON_DATATYPE_INDEX >>>> >>>> The first is used in only one case: >>>> When dropping a node group, check if there are any datasets using this >>>> node >>>> group. If so, don't allow the drop >>>> BUT, this index has a field called "dataverse" which is not used at all. >>>> >>>> This one seems like a waste of space since we do this almost never. (Not >>> much space, but unnecessary.) If we keep it it should become a proper >>> index. >>> >>> The second is used when dropping a datatype. If there is a dataset using >>>> this datatype, don't allow the drop. >>>> Similarly, this index has a "dataverse" which is never used. >>>> >>>> You're about to use the dataverse part, right? :-) This index seems >>> like >>> it will be useful but should be a proper index. >>> >>> The third index is used to go in two cases, using two different ideas of >>>> "keys" >>>> It seems like this should actually be two different indexes. >>>> >>>> I don't think I understood this comment.... >>> >>> >>> This is my understanding so far. It would be good to discuss what the >>>> "correct" version should be. >>>> Steven >>>> >>>> >>>> >>>> >>>> On Mon, Dec 14, 2015 at 10:12 AM, Steven Jacobs <[email protected]> >>>> wrote: >>>> >>>> Hi all, >>>> >>>>> I'm implementing a change so that datasets can use datatypes from >>>>> alternate data verses (previously the type and set had to be from the >>>>> same >>>>> dataverse). Unfortunately this means another change for Dataset >>>>> Metadata >>>>> (which will now store the dataverse for its type). >>>>> >>>>> As such, I had a couple of questions: >>>>> >>>>> 1) Should this change be thrown into the release branch, as it is >>>>> another >>>>> Metadata change? >>>>> >>>>> 2) In implementing this change, I've been looking at the Metadata >>>>> secondary indexes. I had a discussion with Ildar, and it seems the >>>>> thread >>>>> on Metadata secondary indexes being "hacked" has been lost. Is this >>>>> also >>>>> something that should get into the release? Is there anyone currently >>>>> looking at it? >>>>> >>>>> Steven >>>>> >>>>> >>>>> >
