Re: Timeline for 0.15.0 release

Wes McKinney Tue, 27 Aug 2019 13:49:41 -0700

hi,

I think we should try to release the week of September 9, so
development work should be completed by end of next week.


Does that seem reasonable?

I plan to get up a patch for the protocol alignment changes for C++ in
the next couple of days -- I think that getting the alignment work
done is the main barrier to releasing.

Thanks
Wes

On Mon, Aug 19, 2019 at 12:25 PM Ji Liu <[email protected]> wrote:
>
> Hi, Wes, on the java side, I can think of several bugs that need to be fixed 
> or reminded.
>
> i. ARROW-6040: Dictionary entries are required in IPC streams even when 
> empty[1]
> This one is under review now, however through this PR we find that there 
> seems a bug in java reading and writing dictionaries in IPC which is 
> Inconsistent with spec[2] since it assumes all dictionaries are at the start 
> of stream (see details in PR comments,  and this fix may not catch up with 
> version 0.15). @Micah Kornfield
>
> ii. ARROW-1875: Write 64-bit ints as strings in integration test JSON files[3]
> Java side code already checked in, other implementations seems not.
>
> iii. ARROW-6202: OutOfMemory in JdbcAdapter[4]
> Caused by trying to load all records in one contiguous batch, fixed by 
> providing iterator API for iteratively reading in ARROW-6219[5].
>
> Thanks,
> Ji Liu
>
> [1] https://github.com/apache/arrow/pull/4960
> [2] https://arrow.apache.org/docs/ipc.html
> [3] https://issues.apache.org/jira/browse/ARROW-1875
> [4] https://issues.apache.org/jira/browse/ARROW-6202[5] 
> https://issues.apache.org/jira/browse/ARROW-6219
>
>
>
> ------------------------------------------------------------------
> From:Wes McKinney <[email protected]>
> Send Time:2019年8月19日(星期一) 23:03
> To:dev <[email protected]>
> Subject:Re: Timeline for 0.15.0 release
>
> I'm going to work some on organizing the 0.15.0 backlog some this
> week, if anyone wants to help with grooming (particularly for
> languages other than C++/Python where I'm focusing) that would be
> helpful. There have been almost 500 JIRA issues opened since the
> 0.14.0 release, so we should make sure to check whether there's any
> regressions or other serious bugs that we should try to fix for
> 0.15.0.
>
> On Thu, Aug 15, 2019 at 6:23 PM Wes McKinney <[email protected]> wrote:
> >
> > The Windows wheel issue in 0.14.1 seems to be
> >
> > https://issues.apache.org/jira/browse/ARROW-6015
> >
> > I think the root cause could be the Windows changes in
> >
> > https://github.com/apache/arrow/commit/223ae744cc2a12c60cecb5db593263a03c13f85a
> >
> > I would be appreciative if a volunteer would look into what was wrong
> > with the 0.14.1 wheels on Windows. Otherwise 0.15.0 Windows wheels
> > will be broken, too
> >
> > The bad wheels can be found at
> >
> > https://bintray.com/apache/arrow/python#files/python%2F0.14.1
> >
> > On Thu, Aug 15, 2019 at 1:28 PM Antoine Pitrou <[email protected]> wrote:
> > >
> > > On Thu, 15 Aug 2019 11:17:07 -0700
> > > Micah Kornfield <[email protected]> wrote:
> > > > >
> > > > > In C++ they are
> > > > > independent, we could have 32-bit array lengths and variable-length
> > > > > types with 64-bit offsets if we wanted (we just wouldn't be able to
> > > > > have a List child with more than INT32_MAX elements).
> > > >
> > > > I think the point is we could do this in C++ but we don't.  I'm not 
> > > > sure we
> > > > would have introduced the "Large" types if we did.
> > >
> > > 64-bit offsets take twice as much space as 32-bit offsets, so if you're
> > > storing lots of small-ish lists or strings, 32-bit offsets are
> > > preferrable.  So even with 64-bit array lengths from the start it would
> > > still be beneficial to have types with 32-bit offsets.
> > >
> > > > Going with the limited address space in Java and calling it a reference
> > > > implementation seems suboptimal. If a consumer uses a "Large" type
> > > > presumably it is because they need the ability to store more than 
> > > > INT32_MAX
> > > > child elements in a column, otherwise it is just wasting space [1].
> > >
> > > Probably. Though if the individual elements (lists or strings) are
> > > large, not much space is wasted in proportion, so it may be simpler in
> > > such a case to always create a "Large" type array.
> > >
> > > > [1] I suppose theoretically there might be some performance benefits on
> > > > 64-bit architectures to using the native word sizes.
> > >
> > > Concretely, common 64-bit architectures don't do that, as 32-bit is an
> > > extremely common integer size even in high-performance code.
> > >
> > > Regards
> > >
> > > Antoine.
> > >
> > >
>

Re: Timeline for 0.15.0 release

Reply via email to