Okay, sounds good.

On Fri, Oct 13, 2017 at 2:50 PM, Wes McKinney <wesmck...@gmail.com> wrote:

> It is fine to have not-completely-working states in the refactor
> branch. I recommend do whatever is the most expedient thing to help
> with making progress.
>
> - Wes
>
> On Fri, Oct 13, 2017 at 5:42 PM, Siddharth Teotia <siddha...@dremio.com>
> wrote:
> > Li,
> >
> > I think there is some confusion. Are you suggesting merging into "java
> > vector refactor" branch or the master? Is it fine to merge stuff on the
> > former branch even though few things are broken (around 10 tests) ? If
> this
> > is allowed, I can do some cleanup (some documentation, some TODOs
> suggested
> > by you and Brian) and we can merge the current patch by EOD or over the
> > weekend.
> >
> > Is this okay? Since we are going to iterate over this branch and not
> going
> > to push anything to master until new code is stable, we are probably
> good.
> >
> > Thanks,
> > Siddhath
> >
> >
> >
> > On Fri, Oct 13, 2017 at 12:17 PM, Li Jin <ice.xell...@gmail.com> wrote:
> >
> >> Siddharth,
> >>
> >> Regarding rename:
> >> Yes this can be done later.
> >>
> >> Tests:
> >> I agree having code like https://github.com/apache/
> >> arrow/pull/1164/files#diff-0876c9a0005d1dbaea321ea8d39d79ae is hard to
> >> maintain even temporarily. I am not sure what's the best way to resolve
> >> test failure wrt removing of the accessor/mutator from the vectors.
> Maybe
> >> we can have change the template the create non-accessor/mutator
> >> getter/setters and also remove acessor/mutator in the test for it to
> pass?
> >> What do you think is the easiest?
> >>
> >> Reader/Writer:
> >> Yes we can address this later.
> >>
> >> Apologies if I seem to add more work for merging https://github.com/
> >> apache/arrow/pull/1164, that's not my intention, I think the PR looks
> >> good -
> >> just want to bring up some major design decisions so people can comment
> and
> >> discuss.
> >>
> >> Li
> >>
> >>
> >>
> >>
> >> On Fri, Oct 13, 2017 at 2:37 PM, Siddharth Teotia <siddha...@dremio.com
> >
> >> wrote:
> >>
> >> > I am not quite sure of the need to rename the vectors. Why do we need
> to
> >> > rename? This would first require us to remove all the vectors
> generated
> >> by
> >> > FixedValueVectors.java as they are non-nullable scalar vectors.
> Removing
> >> > non-nullable vectors is one of the goals, but it can be done once the
> new
> >> > infrastructure is properly setup?
> >> >
> >> > In order to merge the existing patch, I first need to address some
> >> (10-15)
> >> > failures -- few of them are correctness issues w.r.t
> >> TestVectorUnloadLoad,
> >> > TestArrowFile and rest all are related to getMutator(), getAccessor()
> >> > throwing UnsupportedOperationException. This is why I was saying
> earlier
> >> > that I will end up doing a lot of rework by writing redundant code
> where
> >> > (if vector instanceof NullableInt or vector instanceof
> NullableVarChar)
> >> we
> >> > don't use the mutator/accessor and for other vectors we use it for the
> >> > current patch. These if conditions are getting complicated with ugly
> type
> >> > casting in some parts of the code --
> >> > https://github.com/apache/arrow/pull/1164/files#diff-
> >> > 0876c9a0005d1dbaea321ea8d39d79ae
> >> >
> >> > So I thought we can probably implement other vectors (remaining
> scalars,
> >> > map and list) where no vector has mutator/accessor and then for every
> >> > ValueVector, we can remove all calls to getMutator(), getAccessor() as
> >> > opposed to doing them selectively ---
> >> > https://github.com/apache/arrow/pull/1164/files#diff-
> >> > e9273a7b3b35ff7f40f101dc2cf95242
> >> >
> >> > I will try to address these failures by EOD and see if this patch can
> be
> >> > merged first.
> >> >
> >> > Regarding readers and writers, can we address them subsequently?
> >> >
> >> > On Fri, Oct 13, 2017 at 11:03 AM, Li Jin <ice.xell...@gmail.com>
> wrote:
> >> >
> >> > > Siddharth,
> >> > >
> >> > > Thanks for the update. I think it's fine to move forward with more
> >> > vectors,
> >> > > but in the mean time, I think we should also prioritize to merge
> >> > > https://github.com/apache/arrow/pull/1164, here are a few comments
> >> needs
> >> > > to
> >> > > be addressed.
> >> > >
> >> > > (1) Backward-compatibility:
> >> > > I think there is no way to maintain backward compability as the new
> >> > vector
> >> > > classes will be renamed, but want to confirm we are OK with this
> >> > decision.
> >> > > We also think the disruption on the Spark side are OK as Spark's use
> >> case
> >> > > is simple and Bryan and I can take care of the code change.
> >> > >
> >> > > (2) Reader/writer classes:
> >> > > How does the reader/writer classes interact with the new and legacy
> >> > vector
> >> > > classes:
> >> > >
> >> > > Discussion: https://github.com/apache/arrow/pull/1164#discussion_
> >> > > r144074264
> >> > >
> >> > > My thoughts are:
> >> > > (1) ArrowReader classes should only return new vector classes
> >> > > (2) ArrowWriter classes should only work with new vector classes
> >> > > (3) To read/write legacy vectors, we can use adapters to turn legacy
> >> > > vectors to new vectors (zero-copy, as the underlying buffers should
> be
> >> > > transferred directly)
> >> > >
> >> > > Jacques also has a few comments, I don't know if they have been
> >> > addressed.
> >> > >
> >> > > For other comments, I think we can add TODO and do it later. I
> think we
> >> > can
> >> > > merge this PR if we address (1) (2) above.
> >> > >
> >> > > Comments?
> >> > >
> >> > >
> >> > >
> >> > >
> >> > >
> >> > >
> >> > > On Fri, Oct 13, 2017 at 12:36 PM, Siddharth Teotia <
> >> siddha...@dremio.com
> >> > >
> >> > > wrote:
> >> > >
> >> > > > The patch that I have put up https://github.com/apache/
> >> arrow/pull/1198
> >> > > > seems to be in a reasonable state. We are now working off a
> different
> >> > > > branch "java vector refactor".
> >> > > >
> >> > > > Now that we have the basic structure,  in order to make quick
> forward
> >> > > > progress, I would like to go ahead and do for other types (FLOAT,
> >> > BIGINT
> >> > > > etc), list, map and create their legacy
> >> > > > counter parts -- doing them in subsequent patches is requiring me
> to
> >> > > write
> >> > > > some duplicate code and redundant if conditions in code that
> expects
> >> > all
> >> > > > the vectors to have mutator/accessor.
> >> > > >
> >> > > > Is that fine? Just wanted to check with people and ensure there
> >> aren't
> >> > > any
> >> > > > major concerns.
> >> > > >
> >> > > > The feedback on the PR (original one for master
> >> > > > https://github.com/apache/arrow/pull/1164) has been really good
> --
> >> > some
> >> > > of
> >> > > > the comments are yet to be addressed and we jointly decided to
> >> address
> >> > > few
> >> > > > things (like Minor Type etc) after the refactoring has been done.
> >> > > >
> >> > > > On the testing front, as far as the correctness is concerned, I
> have
> >> > two
> >> > > > failures in TestArrowFile and TestValueVector. I have added some
> more
> >> > > tests
> >> > > > too.
> >> > > >
> >> > > >
> >> > > >
> >> > > >
> >> > > >
> >> > > > On Thu, Oct 12, 2017 at 2:18 PM, Siddharth Teotia <
> >> > siddha...@dremio.com>
> >> > > > wrote:
> >> > > >
> >> > > > > Yes, that is the intention. Good that we all are on the same
> page.
> >> I
> >> > > will
> >> > > > > move the PR https://github.com/apache/arrow/pull/1164 to new
> >> branch.
> >> > > > >
> >> > > > > On Thu, Oct 12, 2017 at 11:20 AM, Li Jin <ice.xell...@gmail.com
> >
> >> > > wrote:
> >> > > > >
> >> > > > >> To make clear, I think it's fine to have Legacy Vectors in 0.8
> as
> >> a
> >> > > > >> deprecated API.
> >> > > > >>
> >> > > > >> On Thu, Oct 12, 2017 at 2:19 PM, Li Jin <ice.xell...@gmail.com
> >
> >> > > wrote:
> >> > > > >>
> >> > > > >> > Siddharth,
> >> > > > >> >
> >> > > > >> > For working off a branch, Wes has created
> >> > > https://github.com/apache/
> >> > > > >> > arrow/tree/java-vector-refactor that we can submit PR to.
> >> > > > >> >
> >> > > > >> > For Legacy vectors, I think it's fine because it's really
> just a
> >> > > > >> migration
> >> > > > >> > path to help Dremio to migrate to the new vectors. I don't
> think
> >> > > other
> >> > > > >> > users, i.e., Spark will use the Legacy vector class. Bryan
> and I
> >> > > will
> >> > > > >> just
> >> > > > >> > migrate Spark to new vectors directly because Spark's use of
> >> Arrow
> >> > > is
> >> > > > >> very
> >> > > > >> > simple.
> >> > > > >> >
> >> > > > >> >
> >> > > > >> >
> >> > > > >> > On Thu, Oct 12, 2017 at 2:08 PM, Siddharth Teotia <
> >> > > > siddha...@dremio.com
> >> > > > >> >
> >> > > > >> > wrote:
> >> > > > >> >
> >> > > > >> >> Thanks Bryan and Li.
> >> > > > >> >>
> >> > > > >> >> Yes, the goal is to get this (and the subsequent patches)
> >> merged
> >> > to
> >> > > > the
> >> > > > >> >> new
> >> > > > >> >> branch. Once it is stabilized from different aspects, we can
> >> move
> >> > > to
> >> > > > >> >> master. I am not sure of the exact mechanics when we work
> off a
> >> > > > >> different
> >> > > > >> >> project branch and not master.
> >> > > > >> >>
> >> > > > >> >> Does that sound good?
> >> > > > >> >>
> >> > > > >> >> Regarding compatibility, are we suggesting that let's not
> >> create
> >> > > > Legacy
> >> > > > >> >> Nullable vectors at all? The initial thoughts were to
> generate
> >> > > Legacy
> >> > > > >> >> vectors from NullableValueVectors template and these vectors
> >> are
> >> > > > >> >> mutator/accessor based (in today's world). Internally each
> >> > > operation
> >> > > > >> will
> >> > > > >> >> be delegated to new vectors (non code generated).
> >> > > > >> >>
> >> > > > >> >> On Thu, Oct 12, 2017 at 10:58 AM, Bryan Cutler <
> >> > cutl...@gmail.com>
> >> > > > >> wrote:
> >> > > > >> >>
> >> > > > >> >> > Thanks for the update Siddharth.  From the Spark side of
> >> this,
> >> > I
> >> > > > >> >> definitely
> >> > > > >> >> > want to try to upgrade to the latest Arrow before the
> Spark
> >> 2.3
> >> > > > >> release
> >> > > > >> >> but
> >> > > > >> >> > if it the refactor is too disruptive then others might get
> >> > > > squeamish
> >> > > > >> >> about
> >> > > > >> >> > upgrading.  On the other hand, I don't think we should
> hold
> >> > back
> >> > > on
> >> > > > >> >> > refactoring for compatibility sake and the way it's
> looking
> >> now
> >> > > > >> trying
> >> > > > >> >> to
> >> > > > >> >> > be backwards-compatible will be too much of a pain.  I
> will
> >> try
> >> > > to
> >> > > > >> >> figure
> >> > > > >> >> > out the timeline for Spark 2.3 and what the feeling is for
> >> > > > upgrading
> >> > > > >> >> > Arrow.  Can we hold off on merging this to master for now
> and
> >> > > just
> >> > > > >> work
> >> > > > >> >> out
> >> > > > >> >> > of the separate branch until we can get a better feeling
> for
> >> > the
> >> > > > >> impact?
> >> > > > >> >> >
> >> > > > >> >> > Bryan
> >> > > > >> >> >
> >> > > > >> >> > On Wed, Oct 11, 2017 at 8:19 AM, Li Jin <
> >> ice.xell...@gmail.com
> >> > >
> >> > > > >> wrote:
> >> > > > >> >> >
> >> > > > >> >> > > Hi Siddharth,
> >> > > > >> >> > >
> >> > > > >> >> > > Thanks for the update. This looks good.
> >> > > > >> >> > >
> >> > > > >> >> > > A few thoughts:
> >> > > > >> >> > >
> >> > > > >> >> > > *Compatibility:*
> >> > > > >> >> > > It sounds like we will introduce some back-compatibility
> >> with
> >> > > the
> >> > > > >> new
> >> > > > >> >> > > Vector class. At this point I think our main Java users
> >> > should
> >> > > be
> >> > > > >> >> Spark
> >> > > > >> >> > and
> >> > > > >> >> > > Dremio, is this right?
> >> > > > >> >> > >
> >> > > > >> >> > >
> >> > > > >> >> > >    - For Spark:
> >> > > > >> >> > >
> >> > > > >> >> > > It seems fine since Spark uses just the basic
> functionality
> >> > of
> >> > > > >> Vector
> >> > > > >> >> > > classes and the existing code should work with the new
> >> Vector
> >> > > > >> classes,
> >> > > > >> >> > > maybe even without any code change on the Spark side.
> >> > > > >> >> > >
> >> > > > >> >> > >
> >> > > > >> >> > >    - For Dremio:
> >> > > > >> >> > >
> >> > > > >> >> > > Sounds like you are already taking care of this by
> >> > introducing
> >> > > > the
> >> > > > >> >> > > LegacyVector classes.
> >> > > > >> >> > >
> >> > > > >> >> > >
> >> > > > >> >> > > *Testing:*
> >> > > > >> >> > >
> >> > > > >> >> > >    - Spark Integration Tests:
> >> > > > >> >> > >
> >> > > > >> >> > > Bryan and I can help with integration test with Spark. I
> >> > think
> >> > > > the
> >> > > > >> >> target
> >> > > > >> >> > > timeline for Spark 2.3 release is some time in mid Nov
> >> (Bryan
> >> > > > >> please
> >> > > > >> >> > > correct me if I am wrong).
> >> > > > >> >> > >
> >> > > > >> >> > > I will take a look at the PR today.
> >> > > > >> >> > >
> >> > > > >> >> > >
> >> > > > >> >> > >
> >> > > > >> >> > >
> >> > > > >> >> > >
> >> > > > >> >> > >
> >> > > > >> >> > > On Tue, Oct 10, 2017 at 4:29 PM, Siddharth Teotia <
> >> > > > >> >> siddha...@dremio.com>
> >> > > > >> >> > > wrote:
> >> > > > >> >> > >
> >> > > > >> >> > > > Hi All,
> >> > > > >> >> > > >
> >> > > > >> >> > > > I wanted to update everyone on state of this
> >> mini-project:
> >> > > > >> >> > > >
> >> > > > >> >> > > >
> >> > > > >> >> > > >    - Requirements document and initial design proposal
> >> were
> >> > > > sent
> >> > > > >> >> out to
> >> > > > >> >> > > the
> >> > > > >> >> > > >    community for review and we have received some good
> >> > > > feedback.
> >> > > > >> All
> >> > > > >> >> > > > required
> >> > > > >> >> > > >    docs are attached with corresponding JIRAs.
> >> > > > >> >> > > >
> >> > > > >> >> > > >
> >> > > > >> >> > > >    - The initial prototype is in a reasonable state
> >> > > > >> (code-complete).
> >> > > > >> >> > You
> >> > > > >> >> > > >    can see the PR here -
> https://github.com/apache/arro
> >> > > > >> w/pull/1164
> >> > > > >> >> > > >
> >> > > > >> >> > > >
> >> > > > >> >> > > >    - The prototype has code changes for the new
> >> hierarchy,
> >> > > > >> abstract
> >> > > > >> >> > > >    interfaces for fixed width and variable width
> vectors
> >> > and
> >> > > > >> >> concrete
> >> > > > >> >> > > >    implementation of NullableIntVector and
> >> > > > NullableVarCharVector.
> >> > > > >> >> > > >
> >> > > > >> >> > > >
> >> > > > >> >> > > > Plan for testing and integrating into existing
> >> > > infrastructure:
> >> > > > >> >> > > >
> >> > > > >> >> > > >
> >> > > > >> >> > > >    - My initial thoughts are that this particular
> patch
> >> > will
> >> > > > >> >> require a
> >> > > > >> >> > > lot
> >> > > > >> >> > > >    of testing, reviews etc since the foundation of
> rest
> >> of
> >> > > the
> >> > > > >> >> > > > implementation
> >> > > > >> >> > > >    more or less depends on how the APIs are flushed
> out
> >> > here.
> >> > > > >> >> > > >
> >> > > > >> >> > > >
> >> > > > >> >> > > >    - So the goal is to get this properly tested and
> >> merged
> >> > > into
> >> > > > >> >> master
> >> > > > >> >> > > >    first.
> >> > > > >> >> > > >
> >> > > > >> >> > > >
> >> > > > >> >> > > >    - The idea is to slowly deprecate and remove the
> >> > existing
> >> > > > >> >> vectors in
> >> > > > >> >> > > >    stages. In this patch itself, we change the
> existing
> >> > > > >> >> > > >    NullableValueVectors.java template to generate
> >> > > > >> >> > LegacyNullableIntVector
> >> > > > >> >> > > > and
> >> > > > >> >> > > >    LegacyNullableVarCharVector. Each operation on
> these
> >> > > vectors
> >> > > > >> will
> >> > > > >> >> > > > delegate
> >> > > > >> >> > > >    to the corresponding NullableIntVector and
> >> > > > >> NullableVarCharVector
> >> > > > >> >> > that
> >> > > > >> >> > > > are
> >> > > > >> >> > > >    newly implemented.
> >> > > > >> >> > > >
> >> > > > >> >> > > >
> >> > > > >> >> > > >    - This achieves two goals w.r.t testing:
> >> > > > >> >> > > >
> >> > > > >> >> > > >
> >> > > > >> >> > > >    - Firstly, our existing JAVA unit tests will
> >> > automatically
> >> > > > >> >> exercise
> >> > > > >> >> > > the
> >> > > > >> >> > > >       newly written code and its APIs (API names have
> not
> >> > > > >> changed)
> >> > > > >> >> for
> >> > > > >> >> > > >       NullableInt and NullableVarChar vectors.
> >> > > > >> >> > > >
> >> > > > >> >> > > >
> >> > > > >> >> > > >    - Secondly, let's say we rebase Dremio on top of
> Arrow
> >> > > > master
> >> > > > >> and
> >> > > > >> >> > > >       replace all references to NullableIntVector and
> >> > > > >> >> > > > NullableVarCharVector with
> >> > > > >> >> > > >       their Legacy counterparts, things should still
> >> work.
> >> > > > >> >> > > >
> >> > > > >> >> > > >
> >> > > > >> >> > > >    - After this patch gets merged, we can do the
> >> following
> >> > > work
> >> > > > >> in
> >> > > > >> >> > > multiple
> >> > > > >> >> > > >    patches:
> >> > > > >> >> > > >       - Write concrete implementations for rest of the
> >> > > nullable
> >> > > > >> >> types
> >> > > > >> >> > --
> >> > > > >> >> > > >       FLOAT4, FLOAT8, BIGINT, VARBINARY etc
> >> > > > >> >> > > >
> >> > > > >> >> > > >
> >> > > > >> >> > > >    - Write additional tests (definitely needed but the
> >> > first
> >> > > > goal
> >> > > > >> >> is to
> >> > > > >> >> > > >       make sure existing tests are not broken).
> >> > > > >> >> > > >
> >> > > > >> >> > > >
> >> > > > >> >> > > >    - Ensure NullableValueVectors template generates
> >> Legacy
> >> > > > >> vectors
> >> > > > >> >> and
> >> > > > >> >> > > each
> >> > > > >> >> > > >       operation is merely a delegation to the API in
> new
> >> > > > >> >> > implementation.
> >> > > > >> >> > > >
> >> > > > >> >> > > >
> >> > > > >> >> > > >    - In the next Arrow release, remove all Legacy
> vectors
> >> > and
> >> > > > >> >> > > >       NullableValueVectors template since we will have
> >> the
> >> > > > >> >> > implementation
> >> > > > >> >> > > > for
> >> > > > >> >> > > >       each type that passes existing tests.
> >> > > > >> >> > > >
> >> > > > >> >> > > >
> >> > > > >> >> > > >    - I am currently inspecting the newly written code
> and
> >> > > > making
> >> > > > >> >> > changes
> >> > > > >> >> > > to
> >> > > > >> >> > > >       the template to generate Legacy vector types for
> >> > > Nullable
> >> > > > >> Int
> >> > > > >> >> > > > and Nullable
> >> > > > >> >> > > >       VarChar and delegating the operations. The
> changes
> >> > > should
> >> > > > >> be
> >> > > > >> >> > > > available in
> >> > > > >> >> > > >       the PR in a couple of hours.
> >> > > > >> >> > > >
> >> > > > >> >> > > >
> >> > > > >> >> > > > I am wondering if there are any other ideas around
> >> testing,
> >> > > > >> merging
> >> > > > >> >> > etc.
> >> > > > >> >> > > > Please feel free to reply here or comment on the PR.
> >> > > > >> >> > > >
> >> > > > >> >> > > > I would appreciate if people can take time to review
> the
> >> > code
> >> > > > in
> >> > > > >> PR
> >> > > > >> >> --
> >> > > > >> >> > > > especially the abstract classes BaseNullableFixedWidth
> >> and
> >> > > > >> >> > > > BaseNullableVariableWidth. Writing concrete
> >> implementations
> >> > > for
> >> > > > >> >> other
> >> > > > >> >> > > types
> >> > > > >> >> > > > will be much less hassle if these abstract classes
> have
> >> > > proper
> >> > > > >> code.
> >> > > > >> >> > > >
> >> > > > >> >> > > > Thanks,
> >> > > > >> >> > > > Siddharth
> >> > > > >> >> > > >
> >> > > > >> >> > >
> >> > > > >> >> >
> >> > > > >> >>
> >> > > > >> >
> >> > > > >> >
> >> > > > >>
> >> > > > >
> >> > > > >
> >> > > >
> >> > >
> >> >
> >>
>

Reply via email to