Re: Metadata changes

Steven Jacobs Mon, 14 Dec 2015 17:30:50 -0800

There are two cases where the code is attempting to use indexes:
1) When deleting a type, find and delete the anonymous subtypes.
2) When deleting a type, confirm that it is not used as a nested type of
another type.


Ignoring the "indexes" that we have in Metadata, Datatype records have a
field called "Fields" which contains a list of the fields within the type.
Each value in this list has a "fieldname" and "fieldtype."

For 1 we can simply iterate through this list and call delete recursively
when "fieldtype" is not primitive and anonymous
For 2 we need some way to find parent types given a type. The only way to
do this quickly would be to create an index on Fields.fieldtype which is a
field within a record within a list.

Steven

On Monday, December 14, 2015, Mike Carey <[email protected]> wrote:

> Can you briefly explain why option 3 is so heavy? (Remind us how the use
> info is modeled?)
>
> On 12/14/15 3:43 PM, Steven Jacobs wrote:
>
>> We just had a UCR discussion on this topic. The issue is really with the
>> third "index" here. The code now is using one "index" to go in two
>> directions:
>> 1) To find datatypes that use datatype A
>> 2) To find datatypes that are used by datatype A.
>>
>> The way that it works now is hacked together, but designed for
>> performance.
>> So we have three choices here:
>>
>> 1) Stick to the status quo, and leave the "indexes" as they are
>> 2) Remove the Metadata secondary indexes, which will eliminate the hack
>> but
>> cost some performance on Metadata
>> 3) Implement the Metadata secondary indexes correctly as Asterix indexes.
>> For this solution to work with our dataset designs, we will need to have
>> the ability to index homogeneous lists. In addition, we will have reverse
>> compatibility issues unless we plan things out for the transition.
>>
>> What are the thoughts?
>>
>>
>> Orthogonally, it seems that the consensus for storing the datatype
>> dataverse in the dataset Metadata is to just add it as an open field at
>> least for now. Is that correct?
>>
>> Steven
>>
>>
>> On Mon, Dec 14, 2015 at 1:23 PM, Mike Carey <[email protected]> wrote:
>>
>> Thoughts inlined:
>>>
>>> On 12/14/15 11:12 AM, Steven Jacobs wrote:
>>>
>>> Here are the conclusions that Ildar and I have drawn from looking at the
>>>> secondary indexes:
>>>>
>>>> First of all it seems that datasets are local to node groups, but
>>>> dataverses can span node groups, which seems a little odd to me.
>>>>
>>>> Node groups are an undocumented but to-be-exploited-someday feature that
>>> allows datasets to be stored on less than all nodes in a given cluster.
>>> As
>>> we face bigger clusters, we'll want to open up that possibility.  We will
>>> hopefully use them inside w/o having to make users manage them manually
>>> like parallel DB2 did/does.  Dataverses are really just a namespace
>>> thing,
>>> not a storage thing at all, so they are orthogonal to (and unrelated to)
>>> node groups.
>>>
>>> There are three Metadata secondary indexes:  GROUPNAME_ON_DATASET_INDEX,
>>>> DATATYPENAME_ON_DATASET_INDEX, DATATYPENAME_ON_DATATYPE_INDEX
>>>>
>>>> The first is used in only one case:
>>>> When dropping a node group, check if there are any datasets using this
>>>> node
>>>> group. If so, don't allow the drop
>>>> BUT, this index has a field called "dataverse" which is not used at all.
>>>>
>>>> This one seems like a waste of space since we do this almost never. (Not
>>> much space, but unnecessary.)  If we keep it it should become a proper
>>> index.
>>>
>>> The second is used when dropping a datatype. If there is a dataset using
>>>> this datatype, don't allow the drop.
>>>> Similarly, this index has a "dataverse" which is never used.
>>>>
>>>> You're about to use the dataverse part, right?  :-)  This index seems
>>> like
>>> it will be useful but should be a proper index.
>>>
>>> The third index is used to go in two cases, using two different ideas of
>>>> "keys"
>>>> It seems like this should actually be two different indexes.
>>>>
>>>> I don't think I understood this comment....
>>>
>>>
>>> This is my understanding so far. It would be good to discuss what the
>>>> "correct" version should be.
>>>> Steven
>>>>
>>>>
>>>>
>>>>
>>>> On Mon, Dec 14, 2015 at 10:12 AM, Steven Jacobs <[email protected]>
>>>> wrote:
>>>>
>>>> Hi all,
>>>>
>>>>> I'm implementing a change so that datasets can use datatypes from
>>>>> alternate data verses (previously the type and set had to be from the
>>>>> same
>>>>> dataverse). Unfortunately this means another change for Dataset
>>>>> Metadata
>>>>> (which will now store the dataverse for its type).
>>>>>
>>>>> As such, I had a couple of questions:
>>>>>
>>>>> 1) Should this change be thrown into the release branch, as it is
>>>>> another
>>>>> Metadata change?
>>>>>
>>>>> 2) In implementing this change, I've been looking at the Metadata
>>>>> secondary indexes. I had a discussion with Ildar, and it seems the
>>>>> thread
>>>>> on Metadata secondary indexes being "hacked" has been lost. Is this
>>>>> also
>>>>> something that should get into the release? Is there anyone currently
>>>>> looking at it?
>>>>>
>>>>> Steven
>>>>>
>>>>>
>>>>>
>

Re: Metadata changes

Reply via email to