>
> In C++ they are
> independent, we could have 32-bit array lengths and variable-length
> types with 64-bit offsets if we wanted (we just wouldn't be able to
> have a List child with more than INT32_MAX elements).

I think the point is we could do this in C++ but we don't.  I'm not sure we
would have introduced the "Large" types if we did.
We will have to do this Java, it we don't want to convert to 64-bit
addressing.

Going with the limited address space in Java and calling it a reference
implementation seems suboptimal. If a consumer uses a "Large" type
presumably it is because they need the ability to store more than INT32_MAX
child elements in a column, otherwise it is just wasting space [1].

Let's pause until next week when Jacques is back online (and continue on
the other thread).  Like I said I think there is enough time either way to
get something in along the timeline we expect for the next release.

[1] I suppose theoretically there might be some performance benefits on
64-bit architectures to using the native word sizes.

On Thu, Aug 15, 2019 at 10:59 AM Wes McKinney <wesmck...@gmail.com> wrote:

> On Thu, Aug 15, 2019 at 12:00 AM Micah Kornfield <emkornfi...@gmail.com>
> wrote:
> >
> > Hi Wes,
> > >
> > > Do these need to be dependent on the 64-bit array length discussion?
> >
> > We could hack something that can read the lower 32-bit range, so I guess
> > not, but this leaves a bad taste in my mouth.  I think there is likely
> > still enough time to have the discussion and get these implemented, one
> way
> > or another.
> >
>
> I guess I still don't understand how the array lengths and the
> List/Varchar offsets are related to each other. I probably just
> haven't looked at the Java library enough. In C++ they are
> independent, we could have 32-bit array lengths and variable-length
> types with 64-bit offsets if we wanted (we just wouldn't be able to
> have a List child with more than INT32_MAX elements). We would have to
> do a limited amount of boundschecking at IPC boundary points (like
> Java is checking presumably now for vectors exceeding INT32_MAX).
>
> > For the record, I don't think we should hold a major release hostage
> > > if we aren't able to complete various feature milestones in time.
> > > Since it's been about 5-6 weeks since 0.14.0 we're coming close to the
> > > desired 8-10 week timeline for major releases, so if we need to have
> > > 0.16.0 prior to 1.0.0, I think that is OK also.
> >
> > I agree with the time based milestones in practice, but we are
> backpedaling
> > on the intent to keep type parity between the two reference
> > implementations.  At least the way I read the previous threads on the
> > topic, I thought there was lazy consensus that in lieu of requiring
> working
> > implementations in Java and C++ be checked in at the same time, we would
> > rely on the release as a mechanism to forcing function for parity.
> >
>
> I agree with the intent and spirit of the idea, but it seems we have a
> can of worms on our hands now and so I don't think we should keep from
> releasing the work that has been completed if consensus about Java
> changes is not reached in time.
>
> > Thanks,
> > Micah
> >
> > On Wed, Aug 14, 2019 at 11:32 AM Antoine Pitrou <anto...@python.org>
> wrote:
> >
> > >
> > > Agreed with Wes.
> > >
> > > Regards
> > >
> > > Antoine.
> > >
> > >
> > > Le 14/08/2019 à 20:30, Wes McKinney a écrit :
> > > > For the record, I don't think we should hold a major release hostage
> > > > if we aren't able to complete various feature milestones in time.
> > > > Since it's been about 5-6 weeks since 0.14.0 we're coming close to
> the
> > > > desired 8-10 week timeline for major releases, so if we need to have
> > > > 0.16.0 prior to 1.0.0, I think that is OK also.
> > > >
> > > > On Wed, Aug 14, 2019 at 11:45 AM Wes McKinney <wesmck...@gmail.com>
> > > wrote:
> > > >>
> > > >> On Wed, Aug 14, 2019 at 11:43 AM Micah Kornfield <
> emkornfi...@gmail.com>
> > > wrote:
> > > >>>
> > > >>>>
> > > >>>>  is there anything else that has come up that
> > > >>>> definitely needs to happen before we can release again?
> > > >>>
> > > >>> We need to decide on a way forward for LargeList, LargeBinary, etc,
> > > types...
> > > >>>
> > > >>
> > > >> Do these need to be dependent on the 64-bit array length discussion?
> > > >> They seem somewhat orthogonal to me. If we have to release 0.15.0
> > > >> without the Java side of these, that's OK with me, since reaching
> > > >> format implementation completeness is more of a 1.0.0 concern
> > > >>
> > > >>> On Tue, Aug 13, 2019 at 8:27 PM Wes McKinney <wesmck...@gmail.com>
> > > wrote:
> > > >>>
> > > >>>> hi folks,
> > > >>>>
> > > >>>> Since there have been a number of fairly serious issues (e.g.
> > > >>>> ARROW-6060) since 0.14.1 that have been fixed I think we should
> start
> > > >>>> planning of the next major release. Note that we still have some
> > > >>>> format-related work (the Flatbuffers alignment issue) that ought
> to be
> > > >>>> resolved (not a small task since it affects 4 or 5
> implementations),
> > > >>>> but aside from that, is there anything else that has come up that
> > > >>>> definitely needs to happen before we can release again?
> > > >>>>
> > > >>>> I would say cutting a release somewhere around the US Labor Day
> > > >>>> holiday (~the week after or so) would be called for.
> > > >>>>
> > > >>>> Thanks,
> > > >>>> Wes
> > > >>>>
> > >
>

Reply via email to