Re: Timeline for 0.15.0 release

Wes McKinney Tue, 10 Sep 2019 16:12:25 -0700

Hi folks,

With the state of nightly packaging and integration builds things aren't
looking too good for being in release readiness by the end of this week but
maybe I'm wrong. I'm planning to be working to close as many issues as I
can and also to help with the ongoing alignment fixes.


Wes

On Thu, Sep 5, 2019, 11:07 PM Micah Kornfield <[email protected]> wrote:

> Just for reference [1] has a dashboard of the current issues:
>
> https://cwiki.apache.org/confluence/display/ARROW/Arrow+0.15.0+Release
>
> On Thu, Sep 5, 2019 at 3:43 PM Wes McKinney <[email protected]> wrote:
>
>> hi all,
>>
>> It doesn't seem like we're going to be in a position to release at the
>> beginning of next week. I hope that one more week of work (or less)
>> will be enough to get us there. Aside from merging the alignment
>> changes, we need to make sure that our packaging jobs required for the
>> release candidate are all working.
>>
>> If folks could remove issues from the 0.15.0 backlog that they don't
>> think they will finish by end of next week that would help focus
>> efforts (there are currently 78 issues in 0.15.0 still). I am looking
>> to tackle a few small features related to dictionaries while the
>> release window is still open.
>>
>> - Wes
>>
>> On Tue, Aug 27, 2019 at 3:48 PM Wes McKinney <[email protected]> wrote:
>> >
>> > hi,
>> >
>> > I think we should try to release the week of September 9, so
>> > development work should be completed by end of next week.
>> >
>> > Does that seem reasonable?
>> >
>> > I plan to get up a patch for the protocol alignment changes for C++ in
>> > the next couple of days -- I think that getting the alignment work
>> > done is the main barrier to releasing.
>> >
>> > Thanks
>> > Wes
>> >
>> > On Mon, Aug 19, 2019 at 12:25 PM Ji Liu <[email protected]>
>> wrote:
>> > >
>> > > Hi, Wes, on the java side, I can think of several bugs that need to
>> be fixed or reminded.
>> > >
>> > > i. ARROW-6040: Dictionary entries are required in IPC streams even
>> when empty[1]
>> > > This one is under review now, however through this PR we find that
>> there seems a bug in java reading and writing dictionaries in IPC which is
>> Inconsistent with spec[2] since it assumes all dictionaries are at the
>> start of stream (see details in PR comments,  and this fix may not catch up
>> with version 0.15). @Micah Kornfield
>> > >
>> > > ii. ARROW-1875: Write 64-bit ints as strings in integration test JSON
>> files[3]
>> > > Java side code already checked in, other implementations seems not.
>> > >
>> > > iii. ARROW-6202: OutOfMemory in JdbcAdapter[4]
>> > > Caused by trying to load all records in one contiguous batch, fixed
>> by providing iterator API for iteratively reading in ARROW-6219[5].
>> > >
>> > > Thanks,
>> > > Ji Liu
>> > >
>> > > [1] https://github.com/apache/arrow/pull/4960
>> > > [2] https://arrow.apache.org/docs/ipc.html
>> > > [3] https://issues.apache.org/jira/browse/ARROW-1875
>> > > [4] https://issues.apache.org/jira/browse/ARROW-6202[5]
>> https://issues.apache.org/jira/browse/ARROW-6219
>> > >
>> > >
>> > >
>> > > ------------------------------------------------------------------
>> > > From:Wes McKinney <[email protected]>
>> > > Send Time:2019年8月19日(星期一) 23:03
>> > > To:dev <[email protected]>
>> > > Subject:Re: Timeline for 0.15.0 release
>> > >
>> > > I'm going to work some on organizing the 0.15.0 backlog some this
>> > > week, if anyone wants to help with grooming (particularly for
>> > > languages other than C++/Python where I'm focusing) that would be
>> > > helpful. There have been almost 500 JIRA issues opened since the
>> > > 0.14.0 release, so we should make sure to check whether there's any
>> > > regressions or other serious bugs that we should try to fix for
>> > > 0.15.0.
>> > >
>> > > On Thu, Aug 15, 2019 at 6:23 PM Wes McKinney <[email protected]>
>> wrote:
>> > > >
>> > > > The Windows wheel issue in 0.14.1 seems to be
>> > > >
>> > > > https://issues.apache.org/jira/browse/ARROW-6015
>> > > >
>> > > > I think the root cause could be the Windows changes in
>> > > >
>> > > >
>> https://github.com/apache/arrow/commit/223ae744cc2a12c60cecb5db593263a03c13f85a
>> > > >
>> > > > I would be appreciative if a volunteer would look into what was
>> wrong
>> > > > with the 0.14.1 wheels on Windows. Otherwise 0.15.0 Windows wheels
>> > > > will be broken, too
>> > > >
>> > > > The bad wheels can be found at
>> > > >
>> > > > https://bintray.com/apache/arrow/python#files/python%2F0.14.1
>> > > >
>> > > > On Thu, Aug 15, 2019 at 1:28 PM Antoine Pitrou <[email protected]>
>> wrote:
>> > > > >
>> > > > > On Thu, 15 Aug 2019 11:17:07 -0700
>> > > > > Micah Kornfield <[email protected]> wrote:
>> > > > > > >
>> > > > > > > In C++ they are
>> > > > > > > independent, we could have 32-bit array lengths and
>> variable-length
>> > > > > > > types with 64-bit offsets if we wanted (we just wouldn't be
>> able to
>> > > > > > > have a List child with more than INT32_MAX elements).
>> > > > > >
>> > > > > > I think the point is we could do this in C++ but we don't.  I'm
>> not sure we
>> > > > > > would have introduced the "Large" types if we did.
>> > > > >
>> > > > > 64-bit offsets take twice as much space as 32-bit offsets, so if
>> you're
>> > > > > storing lots of small-ish lists or strings, 32-bit offsets are
>> > > > > preferrable.  So even with 64-bit array lengths from the start it
>> would
>> > > > > still be beneficial to have types with 32-bit offsets.
>> > > > >
>> > > > > > Going with the limited address space in Java and calling it a
>> reference
>> > > > > > implementation seems suboptimal. If a consumer uses a "Large"
>> type
>> > > > > > presumably it is because they need the ability to store more
>> than INT32_MAX
>> > > > > > child elements in a column, otherwise it is just wasting space
>> [1].
>> > > > >
>> > > > > Probably. Though if the individual elements (lists or strings) are
>> > > > > large, not much space is wasted in proportion, so it may be
>> simpler in
>> > > > > such a case to always create a "Large" type array.
>> > > > >
>> > > > > > [1] I suppose theoretically there might be some performance
>> benefits on
>> > > > > > 64-bit architectures to using the native word sizes.
>> > > > >
>> > > > > Concretely, common 64-bit architectures don't do that, as 32-bit
>> is an
>> > > > > extremely common integer size even in high-performance code.
>> > > > >
>> > > > > Regards
>> > > > >
>> > > > > Antoine.
>> > > > >
>> > > > >
>> > >
>>
>

Re: Timeline for 0.15.0 release

Reply via email to