Re: Branch for HIVE-4160

Namit Jain Mon, 08 Apr 2013 19:35:35 -0700

Sounds good to me


On 4/9/13 12:04 AM, "Jitendra Pandey" <[email protected]> wrote:

>I agree that we shouldn't wait too long before merging the branch.
>We are targeting to have basic queries working within a month from now and
>will definitely propose to merge the branch back into trunk at that point.
>We will limit the scope of the work on the branch to just a few operators
>and primitive datatypes. Does that sound reasonable?
>
>regards
>jitendra
>
>On Wed, Apr 3, 2013 at 9:03 PM, Namit Jain <[email protected]> wrote:
>
>> There is no right answer, but I feel if you go this path a long way, it
>> will be very difficult
>> to merge back. Given that this is not a new functionality, and
>>improvement
>> to existing code
>> (which will also evolve), it will become difficult to maintain/review a
>> big diff in the future.
>>
>> I haven't thought much about it, but can start by creating the
>>high-level
>> interfaces first, and then
>> going from there. For e.g.: create interfaces for operators which take
>>in
>> an array of rows instead of
>> a single row - initially the array size can always be 1. Now, proceed
>>from
>> there.
>>
>> What makes you think, merging a branch 6 months/1 year from now will be
>> easier than working on the
>> current branch ?
>>
>> Having said that, both approaches can be made to work - but I think you
>> are just delaying the
>> merging work instead of taking the hit upfront.
>>
>> Thanks,
>> -namit
>>
>>
>>
>> On 4/4/13 2:40 AM, "Jitendra Pandey" <[email protected]> wrote:
>>
>> >   We did consider implementing these changes on the trunk. But, it
>>would
>> >take several patches in various parts of the code before a simple end
>>to
>> >end query can be executed on vectorized path. For example a patch for
>> >vectorized expressions  will be a significant amount of code, but will
>>not
>> >be used in a query until a vectorized operator is implemented and the
>> >query
>> >plan is modified to use the vectorized path. Vectorization of even
>>basic
>> >expressions becomes non trivial because we need to optimize for various
>> >cases like chain of expressions, for non-null columns or repeating
>>values
>> >and also handle case for nullable columns, or short circuit
>>optimization
>> >etc. Careful handling of these is important for performance gains.
>> >
>> > Committing those intermediate patches in trunk  without stabilizing
>>them
>> >in a branch first might be a cause of concern.
>> >
>> >  A separate branch will let us make incremental changes to the system
>>so
>> >that each patch addresses a single feature or functionality and is
>>small
>> >enough to review.
>> >   We will make sure that the branch is frequently updated with the
>> >changes
>> >in the trunk to avoid conflicts at the time of the merge.
>> >  Also, we plan to propose merger of the branch as soon as a basic end
>>to
>> >end query begins to work and is sufficiently tested, instead of waiting
>> >for
>> >all operators to get vectorized. Initially our target is to make select
>> >and
>> >filter operators work with vectorized expressions for primitive types.
>> >
>> >   We will have a single global configuration flag that can be used to
>> >turn
>> >off the entire vectorization code path and we will specifically test to
>> >make sure that when this flag is off there is no regression on the
>>current
>> >system. When vectorization is turned on, we will have a validation
>>step to
>> >make sure the given query is supported on the vectorization path
>>otherwise
>> >it will fall back to current code path.
>> >
>> >  Although, we intend to follow commit then review policy on the branch
>> >for
>> >speed of development, each patch will have an associated jira and will
>>be
>> >available for review and feedback.
>> >
>> >thanks
>> >jitendra
>> >
>> >On Tue, Apr 2, 2013 at 8:37 PM, Namit Jain <[email protected]> wrote:
>> >
>> >> It will be difficult to merge back the branch.
>> >> Can you stage your changes incrementally ?
>> >>
>> >> I mean, start with the making the operators vectorized - it can be a
>>for
>> >> loop to
>> >> start with ? I think it will be very difficult to merge it back if we
>> >> diverge on this.
>> >> I would recommend starting with simple interfaces for operators and
>>then
>> >> plugging them
>> >> in slowly instead of a new branch, unless this approach is extremely
>> >> difficult.
>> >>
>> >>
>> >> Thanks,
>> >> -namit
>> >>
>> >> On 4/3/13 1:52 AM, "Jitendra Pandey" <[email protected]>
>>wrote:
>> >>
>> >> >Hi Folks,
>> >> >     I want to propose for creation of a separate branch for
>>HIVE-4160
>> >> >work. This is a significant amount of work, and support for very
>>basic
>> >> >functionality will need big chunks of code. It will also take some
>> >>time to
>> >> >stabilize and test. A separate dev branch will allow us to do this
>>work
>> >> >incrementally and collaboratively. We have already uploaded a design
>> >> >document on the jira for comments/feedback.
>> >> >
>> >> >thanks
>> >> >jitendra
>> >> >
>> >> >
>> >> >--
>> >> ><http://hortonworks.com/download/>
>> >>
>> >>
>> >
>> >
>> >--
>> ><http://hortonworks.com/download/>
>>
>>
>
>
>-- 
><http://hortonworks.com/download/>

Re: Branch for HIVE-4160

Reply via email to