I'm not sure I follow, you'd still need permission to change the
dataset files. Unless you mean to commit them to git? However I don't
think thats an option either because even at the trivial size they are
now (2 pages) that's huge git-wise.

On Thu, May 11, 2017 at 8:41 AM, abdullah alamoudi <[email protected]> wrote:
> It will not require special access to jenkins...
>
>
>> On May 10, 2017, at 8:06 PM, Ian Maxon <[email protected]> wrote:
>>
>> I don't really see how that would require any less intervention than
>> the current job. As it stands, once it's decided that breaking the
>> storage for a patch is fine, all we need to do is override the -1
>> Verified and merge it. What it checks against is HEAD and HEAD~1 of
>> the patch, not the release version.
>>
>> On Wed, May 10, 2017 at 7:59 PM, abdullah alamoudi <[email protected] 
>> <mailto:[email protected]>> wrote:
>>> A follow up on this. So this change has been through a round of review and 
>>> will be merged soon. Currently, it fails the storage check test. The 
>>> storage check test works as follow:
>>>
>>> It builds some artifacts using the previous asterixdb version and then 
>>> tries to read them using the build with the change. The test itself is good 
>>> as it catches those changes that modifies the storage files but not very 
>>> convenient as one has to have access to jenkins to override them.
>>> What I propose is to disable this test and have binary data files with a 
>>> test that reads them. If a change changes any storage related classes, then 
>>> it will still fail those tests until the files are updated. At which point, 
>>> the reviewer should catch that and if it is a legitimate change, then it 
>>> should be allowed in.
>>>
>>> Eventually, we should have backward compatibility storage and/or a 
>>> migration facility and then maybe we can put those back on jenkins.
>>>
>>> Thoughts? Proposals?
>>>
>>>
>>>> On May 8, 2017, at 10:59 PM, abdullah alamoudi <[email protected]> wrote:
>>>>
>>>> Devs,
>>>> For some time, out storage code has been suffering from incremental design 
>>>> changes through additional use cases that come along research projects.
>>>> We have done some cleanup but the code base still suffered from lots of 
>>>> duplicate code and unneeded work (for both developers and CPUs).
>>>>
>>>> One thing we used to do is whenever we need to access an index, we have to 
>>>> create its "IDataflowHelperFactory". This object will contain most things 
>>>> that defines the index. Interestingly, it didn't contain all that is 
>>>> needed.
>>>> For some obscure reason, typeTraits, comparatorFactories, and 
>>>> bloomFilterKeyFields were places in TreeIndexCreateOperatorDescriptor.
>>>> Different indexes had different dataflow helpers and so sometimes, 
>>>> multiple class definitions are needed for operators per index type. and 
>>>> this leads to further bloat of the system's code.
>>>>
>>>> If one thinks about it, the index related objects that are needed during 
>>>> index construction are only needed when the index gets created. Further 
>>>> needs to access the index for any reason should only gets the index key 
>>>> (the path in this case).
>>>> In fact, if one follows the execution of the code, they will see that that 
>>>> is exactly what is needed and that whenever we try to 
>>>> search/insert/delete/upsert/bulkload, we recompute many artifacts that end 
>>>> up being useless.
>>>>
>>>> So, I proposed a change that fixes this part. Index related objects are 
>>>> only provided at index creation time and for index access, only the index 
>>>> path is required. This is done by removing the create method from the 
>>>> IIndexDataflowHelper interface and moving it to IIndexBuilder.
>>>> With this, all implementations of IIndexDataflowHelper are now in a single 
>>>> class that basically uses the path to fetch the index on a Node Controller.
>>>>
>>>> This change gets rid of more than 3000 lines of code and makes things much 
>>>> cleaner. Classes that should go to hyracks are moved to hyracks. Asterix 
>>>> related information such as dataset id and partition number are kept in 
>>>> asterix through the introduction of DatasetLocalResource. To show the 
>>>> effect of this change, you can look at an example in 
>>>> https://asterix-gerrit.ics.uci.edu/#/c/1728/3/asterixdb/asterix-algebra/src/main/java/org/apache/asterix/algebra/operators/physical/InvertedIndexPOperator.java
>>>>  
>>>> <https://asterix-gerrit.ics.uci.edu/#/c/1728/3/asterixdb/asterix-algebra/src/main/java/org/apache/asterix/algebra/operators/physical/InvertedIndexPOperator.java>
>>>>  
>>>> <https://asterix-gerrit.ics.uci.edu/#/c/1728/3/asterixdb/asterix-algebra/src/main/java/org/apache/asterix/algebra/operators/physical/InvertedIndexPOperator.java
>>>>  
>>>> <https://asterix-gerrit.ics.uci.edu/#/c/1728/3/asterixdb/asterix-algebra/src/main/java/org/apache/asterix/algebra/operators/physical/InvertedIndexPOperator.java>>
>>>>
>>>> Look at the amount of work that was unneeded to access an inverted index. 
>>>> The only thing that was actually needed is the index FileSplitProvider. 
>>>> Take another look at 
>>>> https://asterix-gerrit.ics.uci.edu/#/c/1728/3/asterixdb/asterix-metadata/src/main/java/org/apache/asterix/metadata/declared/MetadataProvider.java
>>>>  
>>>> <https://asterix-gerrit.ics.uci.edu/#/c/1728/3/asterixdb/asterix-metadata/src/main/java/org/apache/asterix/metadata/declared/MetadataProvider.java>
>>>>  
>>>> <https://asterix-gerrit.ics.uci.edu/#/c/1728/3/asterixdb/asterix-metadata/src/main/java/org/apache/asterix/metadata/declared/MetadataProvider.java
>>>>  
>>>> <https://asterix-gerrit.ics.uci.edu/#/c/1728/3/asterixdb/asterix-metadata/src/main/java/org/apache/asterix/metadata/declared/MetadataProvider.java>>
>>>>  and see how much unneeded code gets removed. This is not enough on the 
>>>> asterixdb side but it creates a good foundation that would allow existing 
>>>> and new code to get cleaner.
>>>>
>>>> Please, take a look at the change 
>>>> https://asterix-gerrit.ics.uci.edu/#/c/1728/ 
>>>> <https://asterix-gerrit.ics.uci.edu/#/c/1728/> 
>>>> <https://asterix-gerrit.ics.uci.edu/#/c/1728/ 
>>>> <https://asterix-gerrit.ics.uci.edu/#/c/1728/>> if interested and let me 
>>>> know if you have any comments. Note that it fails storage check test and 
>>>> that is expected because it changes the persisted resource info classes.
>>>>
>>>> Cheers,
>>>> Abdullah.
>

Reply via email to