Re: Timeline for 0.15.0 release

Micah Kornfield Tue, 10 Sep 2019 20:24:30 -0700

I should have a little more bandwidth to help with some of the packaging
starting tomorrow and going into the weekend.


On Tuesday, September 10, 2019, Wes McKinney <[email protected]> wrote:

> Hi folks,
>
> With the state of nightly packaging and integration builds things aren't
> looking too good for being in release readiness by the end of this week but
> maybe I'm wrong. I'm planning to be working to close as many issues as I
> can and also to help with the ongoing alignment fixes.
>
> Wes
>
> On Thu, Sep 5, 2019, 11:07 PM Micah Kornfield <[email protected]>
> wrote:
>
>> Just for reference [1] has a dashboard of the current issues:
>>
>> https://cwiki.apache.org/confluence/display/ARROW/Arrow+0.15.0+Release
>>
>> On Thu, Sep 5, 2019 at 3:43 PM Wes McKinney <[email protected]> wrote:
>>
>>> hi all,
>>>
>>> It doesn't seem like we're going to be in a position to release at the
>>> beginning of next week. I hope that one more week of work (or less)
>>> will be enough to get us there. Aside from merging the alignment
>>> changes, we need to make sure that our packaging jobs required for the
>>> release candidate are all working.
>>>
>>> If folks could remove issues from the 0.15.0 backlog that they don't
>>> think they will finish by end of next week that would help focus
>>> efforts (there are currently 78 issues in 0.15.0 still). I am looking
>>> to tackle a few small features related to dictionaries while the
>>> release window is still open.
>>>
>>> - Wes
>>>
>>> On Tue, Aug 27, 2019 at 3:48 PM Wes McKinney <[email protected]>
>>> wrote:
>>> >
>>> > hi,
>>> >
>>> > I think we should try to release the week of September 9, so
>>> > development work should be completed by end of next week.
>>> >
>>> > Does that seem reasonable?
>>> >
>>> > I plan to get up a patch for the protocol alignment changes for C++ in
>>> > the next couple of days -- I think that getting the alignment work
>>> > done is the main barrier to releasing.
>>> >
>>> > Thanks
>>> > Wes
>>> >
>>> > On Mon, Aug 19, 2019 at 12:25 PM Ji Liu <[email protected]>
>>> wrote:
>>> > >
>>> > > Hi, Wes, on the java side, I can think of several bugs that need to
>>> be fixed or reminded.
>>> > >
>>> > > i. ARROW-6040: Dictionary entries are required in IPC streams even
>>> when empty[1]
>>> > > This one is under review now, however through this PR we find that
>>> there seems a bug in java reading and writing dictionaries in IPC which is
>>> Inconsistent with spec[2] since it assumes all dictionaries are at the
>>> start of stream (see details in PR comments,  and this fix may not catch up
>>> with version 0.15). @Micah Kornfield
>>> > >
>>> > > ii. ARROW-1875: Write 64-bit ints as strings in integration test
>>> JSON files[3]
>>> > > Java side code already checked in, other implementations seems not.
>>> > >
>>> > > iii. ARROW-6202: OutOfMemory in JdbcAdapter[4]
>>> > > Caused by trying to load all records in one contiguous batch, fixed
>>> by providing iterator API for iteratively reading in ARROW-6219[5].
>>> > >
>>> > > Thanks,
>>> > > Ji Liu
>>> > >
>>> > > [1] https://github.com/apache/arrow/pull/4960
>>> > > [2] https://arrow.apache.org/docs/ipc.html
>>> > > [3] https://issues.apache.org/jira/browse/ARROW-1875
>>> > > [4] https://issues.apache.org/jira/browse/ARROW-6202[5]
>>> https://issues.apache.org/jira/browse/ARROW-6219
>>> > >
>>> > >
>>> > >
>>> > > ------------------------------------------------------------------
>>> > > From:Wes McKinney <[email protected]>
>>> > > Send Time:2019年8月19日(星期一) 23:03
>>> > > To:dev <[email protected]>
>>> > > Subject:Re: Timeline for 0.15.0 release
>>> > >
>>> > > I'm going to work some on organizing the 0.15.0 backlog some this
>>> > > week, if anyone wants to help with grooming (particularly for
>>> > > languages other than C++/Python where I'm focusing) that would be
>>> > > helpful. There have been almost 500 JIRA issues opened since the
>>> > > 0.14.0 release, so we should make sure to check whether there's any
>>> > > regressions or other serious bugs that we should try to fix for
>>> > > 0.15.0.
>>> > >
>>> > > On Thu, Aug 15, 2019 at 6:23 PM Wes McKinney <[email protected]>
>>> wrote:
>>> > > >
>>> > > > The Windows wheel issue in 0.14.1 seems to be
>>> > > >
>>> > > > https://issues.apache.org/jira/browse/ARROW-6015
>>> > > >
>>> > > > I think the root cause could be the Windows changes in
>>> > > >
>>> > > > https://github.com/apache/arrow/commit/
>>> 223ae744cc2a12c60cecb5db593263a03c13f85a
>>> > > >
>>> > > > I would be appreciative if a volunteer would look into what was
>>> wrong
>>> > > > with the 0.14.1 wheels on Windows. Otherwise 0.15.0 Windows wheels
>>> > > > will be broken, too
>>> > > >
>>> > > > The bad wheels can be found at
>>> > > >
>>> > > > https://bintray.com/apache/arrow/python#files/python%2F0.14.1
>>> > > >
>>> > > > On Thu, Aug 15, 2019 at 1:28 PM Antoine Pitrou <
>>> [email protected]> wrote:
>>> > > > >
>>> > > > > On Thu, 15 Aug 2019 11:17:07 -0700
>>> > > > > Micah Kornfield <[email protected]> wrote:
>>> > > > > > >
>>> > > > > > > In C++ they are
>>> > > > > > > independent, we could have 32-bit array lengths and
>>> variable-length
>>> > > > > > > types with 64-bit offsets if we wanted (we just wouldn't be
>>> able to
>>> > > > > > > have a List child with more than INT32_MAX elements).
>>> > > > > >
>>> > > > > > I think the point is we could do this in C++ but we don't.
>>> I'm not sure we
>>> > > > > > would have introduced the "Large" types if we did.
>>> > > > >
>>> > > > > 64-bit offsets take twice as much space as 32-bit offsets, so if
>>> you're
>>> > > > > storing lots of small-ish lists or strings, 32-bit offsets are
>>> > > > > preferrable.  So even with 64-bit array lengths from the start
>>> it would
>>> > > > > still be beneficial to have types with 32-bit offsets.
>>> > > > >
>>> > > > > > Going with the limited address space in Java and calling it a
>>> reference
>>> > > > > > implementation seems suboptimal. If a consumer uses a "Large"
>>> type
>>> > > > > > presumably it is because they need the ability to store more
>>> than INT32_MAX
>>> > > > > > child elements in a column, otherwise it is just wasting space
>>> [1].
>>> > > > >
>>> > > > > Probably. Though if the individual elements (lists or strings)
>>> are
>>> > > > > large, not much space is wasted in proportion, so it may be
>>> simpler in
>>> > > > > such a case to always create a "Large" type array.
>>> > > > >
>>> > > > > > [1] I suppose theoretically there might be some performance
>>> benefits on
>>> > > > > > 64-bit architectures to using the native word sizes.
>>> > > > >
>>> > > > > Concretely, common 64-bit architectures don't do that, as 32-bit
>>> is an
>>> > > > > extremely common integer size even in high-performance code.
>>> > > > >
>>> > > > > Regards
>>> > > > >
>>> > > > > Antoine.
>>> > > > >
>>> > > > >
>>> > >
>>>
>>

Re: Timeline for 0.15.0 release

Reply via email to