Re: Yet another storage cleanup and a bit of history

abdullah alamoudi Thu, 11 May 2017 08:42:06 -0700

It will not require special access to jenkins...


> On May 10, 2017, at 8:06 PM, Ian Maxon <[email protected]> wrote:
> 
> I don't really see how that would require any less intervention than
> the current job. As it stands, once it's decided that breaking the
> storage for a patch is fine, all we need to do is override the -1
> Verified and merge it. What it checks against is HEAD and HEAD~1 of
> the patch, not the release version.
> 
> On Wed, May 10, 2017 at 7:59 PM, abdullah alamoudi <[email protected] 
> <mailto:[email protected]>> wrote:
>> A follow up on this. So this change has been through a round of review and 
>> will be merged soon. Currently, it fails the storage check test. The storage 
>> check test works as follow:
>> 
>> It builds some artifacts using the previous asterixdb version and then tries 
>> to read them using the build with the change. The test itself is good as it 
>> catches those changes that modifies the storage files but not very 
>> convenient as one has to have access to jenkins to override them.
>> What I propose is to disable this test and have binary data files with a 
>> test that reads them. If a change changes any storage related classes, then 
>> it will still fail those tests until the files are updated. At which point, 
>> the reviewer should catch that and if it is a legitimate change, then it 
>> should be allowed in.
>> 
>> Eventually, we should have backward compatibility storage and/or a migration 
>> facility and then maybe we can put those back on jenkins.
>> 
>> Thoughts? Proposals?
>> 
>> 
>>> On May 8, 2017, at 10:59 PM, abdullah alamoudi <[email protected]> wrote:
>>> 
>>> Devs,
>>> For some time, out storage code has been suffering from incremental design 
>>> changes through additional use cases that come along research projects.
>>> We have done some cleanup but the code base still suffered from lots of 
>>> duplicate code and unneeded work (for both developers and CPUs).
>>> 
>>> One thing we used to do is whenever we need to access an index, we have to 
>>> create its "IDataflowHelperFactory". This object will contain most things 
>>> that defines the index. Interestingly, it didn't contain all that is needed.
>>> For some obscure reason, typeTraits, comparatorFactories, and 
>>> bloomFilterKeyFields were places in TreeIndexCreateOperatorDescriptor.
>>> Different indexes had different dataflow helpers and so sometimes, multiple 
>>> class definitions are needed for operators per index type. and this leads 
>>> to further bloat of the system's code.
>>> 
>>> If one thinks about it, the index related objects that are needed during 
>>> index construction are only needed when the index gets created. Further 
>>> needs to access the index for any reason should only gets the index key 
>>> (the path in this case).
>>> In fact, if one follows the execution of the code, they will see that that 
>>> is exactly what is needed and that whenever we try to 
>>> search/insert/delete/upsert/bulkload, we recompute many artifacts that end 
>>> up being useless.
>>> 
>>> So, I proposed a change that fixes this part. Index related objects are 
>>> only provided at index creation time and for index access, only the index 
>>> path is required. This is done by removing the create method from the 
>>> IIndexDataflowHelper interface and moving it to IIndexBuilder.
>>> With this, all implementations of IIndexDataflowHelper are now in a single 
>>> class that basically uses the path to fetch the index on a Node Controller.
>>> 
>>> This change gets rid of more than 3000 lines of code and makes things much 
>>> cleaner. Classes that should go to hyracks are moved to hyracks. Asterix 
>>> related information such as dataset id and partition number are kept in 
>>> asterix through the introduction of DatasetLocalResource. To show the 
>>> effect of this change, you can look at an example in 
>>> https://asterix-gerrit.ics.uci.edu/#/c/1728/3/asterixdb/asterix-algebra/src/main/java/org/apache/asterix/algebra/operators/physical/InvertedIndexPOperator.java
>>>  
>>> <https://asterix-gerrit.ics.uci.edu/#/c/1728/3/asterixdb/asterix-algebra/src/main/java/org/apache/asterix/algebra/operators/physical/InvertedIndexPOperator.java>
>>>  
>>> <https://asterix-gerrit.ics.uci.edu/#/c/1728/3/asterixdb/asterix-algebra/src/main/java/org/apache/asterix/algebra/operators/physical/InvertedIndexPOperator.java
>>>  
>>> <https://asterix-gerrit.ics.uci.edu/#/c/1728/3/asterixdb/asterix-algebra/src/main/java/org/apache/asterix/algebra/operators/physical/InvertedIndexPOperator.java>>
>>> 
>>> Look at the amount of work that was unneeded to access an inverted index. 
>>> The only thing that was actually needed is the index FileSplitProvider. 
>>> Take another look at 
>>> https://asterix-gerrit.ics.uci.edu/#/c/1728/3/asterixdb/asterix-metadata/src/main/java/org/apache/asterix/metadata/declared/MetadataProvider.java
>>>  
>>> <https://asterix-gerrit.ics.uci.edu/#/c/1728/3/asterixdb/asterix-metadata/src/main/java/org/apache/asterix/metadata/declared/MetadataProvider.java>
>>>  
>>> <https://asterix-gerrit.ics.uci.edu/#/c/1728/3/asterixdb/asterix-metadata/src/main/java/org/apache/asterix/metadata/declared/MetadataProvider.java
>>>  
>>> <https://asterix-gerrit.ics.uci.edu/#/c/1728/3/asterixdb/asterix-metadata/src/main/java/org/apache/asterix/metadata/declared/MetadataProvider.java>>
>>>  and see how much unneeded code gets removed. This is not enough on the 
>>> asterixdb side but it creates a good foundation that would allow existing 
>>> and new code to get cleaner.
>>> 
>>> Please, take a look at the change 
>>> https://asterix-gerrit.ics.uci.edu/#/c/1728/ 
>>> <https://asterix-gerrit.ics.uci.edu/#/c/1728/> 
>>> <https://asterix-gerrit.ics.uci.edu/#/c/1728/ 
>>> <https://asterix-gerrit.ics.uci.edu/#/c/1728/>> if interested and let me 
>>> know if you have any comments. Note that it fails storage check test and 
>>> that is expected because it changes the persisted resource info classes.
>>> 
>>> Cheers,
>>> Abdullah.

Re: Yet another storage cleanup and a bit of history

Reply via email to