Re: Yet another storage cleanup and a bit of history

Ian Maxon Wed, 10 May 2017 20:06:46 -0700

I don't really see how that would require any less intervention than
the current job. As it stands, once it's decided that breaking the
storage for a patch is fine, all we need to do is override the -1
Verified and merge it. What it checks against is HEAD and HEAD~1 of
the patch, not the release version.


On Wed, May 10, 2017 at 7:59 PM, abdullah alamoudi <[email protected]> wrote:
> A follow up on this. So this change has been through a round of review and 
> will be merged soon. Currently, it fails the storage check test. The storage 
> check test works as follow:
>
> It builds some artifacts using the previous asterixdb version and then tries 
> to read them using the build with the change. The test itself is good as it 
> catches those changes that modifies the storage files but not very convenient 
> as one has to have access to jenkins to override them.
> What I propose is to disable this test and have binary data files with a test 
> that reads them. If a change changes any storage related classes, then it 
> will still fail those tests until the files are updated. At which point, the 
> reviewer should catch that and if it is a legitimate change, then it should 
> be allowed in.
>
> Eventually, we should have backward compatibility storage and/or a migration 
> facility and then maybe we can put those back on jenkins.
>
> Thoughts? Proposals?
>
>
>> On May 8, 2017, at 10:59 PM, abdullah alamoudi <[email protected]> wrote:
>>
>> Devs,
>> For some time, out storage code has been suffering from incremental design 
>> changes through additional use cases that come along research projects.
>> We have done some cleanup but the code base still suffered from lots of 
>> duplicate code and unneeded work (for both developers and CPUs).
>>
>> One thing we used to do is whenever we need to access an index, we have to 
>> create its "IDataflowHelperFactory". This object will contain most things 
>> that defines the index. Interestingly, it didn't contain all that is needed.
>> For some obscure reason, typeTraits, comparatorFactories, and 
>> bloomFilterKeyFields were places in TreeIndexCreateOperatorDescriptor.
>> Different indexes had different dataflow helpers and so sometimes, multiple 
>> class definitions are needed for operators per index type. and this leads to 
>> further bloat of the system's code.
>>
>> If one thinks about it, the index related objects that are needed during 
>> index construction are only needed when the index gets created. Further 
>> needs to access the index for any reason should only gets the index key (the 
>> path in this case).
>> In fact, if one follows the execution of the code, they will see that that 
>> is exactly what is needed and that whenever we try to 
>> search/insert/delete/upsert/bulkload, we recompute many artifacts that end 
>> up being useless.
>>
>> So, I proposed a change that fixes this part. Index related objects are only 
>> provided at index creation time and for index access, only the index path is 
>> required. This is done by removing the create method from the 
>> IIndexDataflowHelper interface and moving it to IIndexBuilder.
>> With this, all implementations of IIndexDataflowHelper are now in a single 
>> class that basically uses the path to fetch the index on a Node Controller.
>>
>> This change gets rid of more than 3000 lines of code and makes things much 
>> cleaner. Classes that should go to hyracks are moved to hyracks. Asterix 
>> related information such as dataset id and partition number are kept in 
>> asterix through the introduction of DatasetLocalResource. To show the effect 
>> of this change, you can look at an example in 
>> https://asterix-gerrit.ics.uci.edu/#/c/1728/3/asterixdb/asterix-algebra/src/main/java/org/apache/asterix/algebra/operators/physical/InvertedIndexPOperator.java
>>  
>> <https://asterix-gerrit.ics.uci.edu/#/c/1728/3/asterixdb/asterix-algebra/src/main/java/org/apache/asterix/algebra/operators/physical/InvertedIndexPOperator.java>
>>
>> Look at the amount of work that was unneeded to access an inverted index. 
>> The only thing that was actually needed is the index FileSplitProvider. Take 
>> another look at 
>> https://asterix-gerrit.ics.uci.edu/#/c/1728/3/asterixdb/asterix-metadata/src/main/java/org/apache/asterix/metadata/declared/MetadataProvider.java
>>  
>> <https://asterix-gerrit.ics.uci.edu/#/c/1728/3/asterixdb/asterix-metadata/src/main/java/org/apache/asterix/metadata/declared/MetadataProvider.java>
>>  and see how much unneeded code gets removed. This is not enough on the 
>> asterixdb side but it creates a good foundation that would allow existing 
>> and new code to get cleaner.
>>
>> Please, take a look at the change 
>> https://asterix-gerrit.ics.uci.edu/#/c/1728/ 
>> <https://asterix-gerrit.ics.uci.edu/#/c/1728/> if interested and let me know 
>> if you have any comments. Note that it fails storage check test and that is 
>> expected because it changes the persisted resource info classes.
>>
>> Cheers,
>> Abdullah.
>

Re: Yet another storage cleanup and a bit of history

Reply via email to