Re: Yet another storage cleanup and a bit of history

Ian Maxon Thu, 11 May 2017 13:27:45 -0700

I'm fine with changing the ability to override as it's recorded as
part of the commit. I don't see any reason it can't be tied to
committer privs.


On Thu, May 11, 2017 at 1:13 PM, Till Westmann <[email protected]> wrote:
> I think that the point is, that the gerrit/jenkins admin rights are not
> necessarily aligned with the committer rights. And so storing and changing
> binaries in git is something everybody can do, while overriding in gerrit is
> not (as least I think that’s the way it is).
> So I think that we should either change the test or the ability to override.
>
> Cheers,
> Till
>
>
> On 11 May 2017, at 12:30, Ian Maxon wrote:
>
>> I'm not sure I follow, you'd still need permission to change the
>> dataset files. Unless you mean to commit them to git? However I don't
>> think thats an option either because even at the trivial size they are
>> now (2 pages) that's huge git-wise.
>>
>> On Thu, May 11, 2017 at 8:41 AM, abdullah alamoudi <[email protected]>
>> wrote:
>>>
>>> It will not require special access to jenkins...
>>>
>>>
>>>> On May 10, 2017, at 8:06 PM, Ian Maxon <[email protected]> wrote:
>>>>
>>>> I don't really see how that would require any less intervention than
>>>> the current job. As it stands, once it's decided that breaking the
>>>> storage for a patch is fine, all we need to do is override the -1
>>>> Verified and merge it. What it checks against is HEAD and HEAD~1 of
>>>> the patch, not the release version.
>>>>
>>>> On Wed, May 10, 2017 at 7:59 PM, abdullah alamoudi <[email protected]
>>>> <mailto:[email protected]>> wrote:
>>>>>
>>>>> A follow up on this. So this change has been through a round of review
>>>>> and will be merged soon. Currently, it fails the storage check test. The
>>>>> storage check test works as follow:
>>>>>
>>>>> It builds some artifacts using the previous asterixdb version and then
>>>>> tries to read them using the build with the change. The test itself is 
>>>>> good
>>>>> as it catches those changes that modifies the storage files but not very
>>>>> convenient as one has to have access to jenkins to override them.
>>>>> What I propose is to disable this test and have binary data files with
>>>>> a test that reads them. If a change changes any storage related classes,
>>>>> then it will still fail those tests until the files are updated. At which
>>>>> point, the reviewer should catch that and if it is a legitimate change, 
>>>>> then
>>>>> it should be allowed in.
>>>>>
>>>>> Eventually, we should have backward compatibility storage and/or a
>>>>> migration facility and then maybe we can put those back on jenkins.
>>>>>
>>>>> Thoughts? Proposals?
>>>>>
>>>>>
>>>>>> On May 8, 2017, at 10:59 PM, abdullah alamoudi <[email protected]>
>>>>>> wrote:
>>>>>>
>>>>>> Devs,
>>>>>> For some time, out storage code has been suffering from incremental
>>>>>> design changes through additional use cases that come along research
>>>>>> projects.
>>>>>> We have done some cleanup but the code base still suffered from lots
>>>>>> of duplicate code and unneeded work (for both developers and CPUs).
>>>>>>
>>>>>> One thing we used to do is whenever we need to access an index, we
>>>>>> have to create its "IDataflowHelperFactory". This object will contain 
>>>>>> most
>>>>>> things that defines the index. Interestingly, it didn't contain all that 
>>>>>> is
>>>>>> needed.
>>>>>> For some obscure reason, typeTraits, comparatorFactories, and
>>>>>> bloomFilterKeyFields were places in TreeIndexCreateOperatorDescriptor.
>>>>>> Different indexes had different dataflow helpers and so sometimes,
>>>>>> multiple class definitions are needed for operators per index type. and 
>>>>>> this
>>>>>> leads to further bloat of the system's code.
>>>>>>
>>>>>> If one thinks about it, the index related objects that are needed
>>>>>> during index construction are only needed when the index gets created.
>>>>>> Further needs to access the index for any reason should only gets the 
>>>>>> index
>>>>>> key (the path in this case).
>>>>>> In fact, if one follows the execution of the code, they will see that
>>>>>> that is exactly what is needed and that whenever we try to
>>>>>> search/insert/delete/upsert/bulkload, we recompute many artifacts that 
>>>>>> end
>>>>>> up being useless.
>>>>>>
>>>>>> So, I proposed a change that fixes this part. Index related objects
>>>>>> are only provided at index creation time and for index access, only the
>>>>>> index path is required. This is done by removing the create method from 
>>>>>> the
>>>>>> IIndexDataflowHelper interface and moving it to IIndexBuilder.
>>>>>> With this, all implementations of IIndexDataflowHelper are now in a
>>>>>> single class that basically uses the path to fetch the index on a Node
>>>>>> Controller.
>>>>>>
>>>>>> This change gets rid of more than 3000 lines of code and makes things
>>>>>> much cleaner. Classes that should go to hyracks are moved to hyracks.
>>>>>> Asterix related information such as dataset id and partition number are 
>>>>>> kept
>>>>>> in asterix through the introduction of DatasetLocalResource. To show the
>>>>>> effect of this change, you can look at an example in
>>>>>> https://asterix-gerrit.ics.uci.edu/#/c/1728/3/asterixdb/asterix-algebra/src/main/java/org/apache/asterix/algebra/operators/physical/InvertedIndexPOperator.java
>>>>>> <https://asterix-gerrit.ics.uci.edu/#/c/1728/3/asterixdb/asterix-algebra/src/main/java/org/apache/asterix/algebra/operators/physical/InvertedIndexPOperator.java>
>>>>>> <https://asterix-gerrit.ics.uci.edu/#/c/1728/3/asterixdb/asterix-algebra/src/main/java/org/apache/asterix/algebra/operators/physical/InvertedIndexPOperator.java
>>>>>> <https://asterix-gerrit.ics.uci.edu/#/c/1728/3/asterixdb/asterix-algebra/src/main/java/org/apache/asterix/algebra/operators/physical/InvertedIndexPOperator.java>>
>>>>>>
>>>>>> Look at the amount of work that was unneeded to access an inverted
>>>>>> index. The only thing that was actually needed is the index
>>>>>> FileSplitProvider. Take another look at
>>>>>> https://asterix-gerrit.ics.uci.edu/#/c/1728/3/asterixdb/asterix-metadata/src/main/java/org/apache/asterix/metadata/declared/MetadataProvider.java
>>>>>> <https://asterix-gerrit.ics.uci.edu/#/c/1728/3/asterixdb/asterix-metadata/src/main/java/org/apache/asterix/metadata/declared/MetadataProvider.java>
>>>>>> <https://asterix-gerrit.ics.uci.edu/#/c/1728/3/asterixdb/asterix-metadata/src/main/java/org/apache/asterix/metadata/declared/MetadataProvider.java
>>>>>> <https://asterix-gerrit.ics.uci.edu/#/c/1728/3/asterixdb/asterix-metadata/src/main/java/org/apache/asterix/metadata/declared/MetadataProvider.java>>
>>>>>> and see how much unneeded code gets removed. This is not enough on the
>>>>>> asterixdb side but it creates a good foundation that would allow existing
>>>>>> and new code to get cleaner.
>>>>>>
>>>>>> Please, take a look at the change
>>>>>> https://asterix-gerrit.ics.uci.edu/#/c/1728/
>>>>>> <https://asterix-gerrit.ics.uci.edu/#/c/1728/>
>>>>>> <https://asterix-gerrit.ics.uci.edu/#/c/1728/
>>>>>> <https://asterix-gerrit.ics.uci.edu/#/c/1728/>> if interested and let me
>>>>>> know if you have any comments. Note that it fails storage check test and
>>>>>> that is expected because it changes the persisted resource info classes.
>>>>>>
>>>>>> Cheers,
>>>>>> Abdullah.
>>>
>>>
>

Re: Yet another storage cleanup and a bit of history

Reply via email to