Re: Timeline for 0.15.0 release

Ji Liu Mon, 19 Aug 2019 10:26:07 -0700

Hi, Wes, on the java side, I can think of several bugs that need to be fixed or 
reminded.


i. ARROW-6040: Dictionary entries are required in IPC streams even when empty[1]
This one is under review now, however through this PR we find that there seems 
a bug in java reading and writing dictionaries in IPC which is Inconsistent 
with spec[2] since it assumes all dictionaries are at the start of stream (see 
details in PR comments,  and this fix may not catch up with version 0.15). 
@Micah Kornfield

ii. ARROW-1875: Write 64-bit ints as strings in integration test JSON files[3]
Java side code already checked in, other implementations seems not.

iii. ARROW-6202: OutOfMemory in JdbcAdapter[4]
Caused by trying to load all records in one contiguous batch, fixed by 
providing iterator API for iteratively reading in ARROW-6219[5].

Thanks,
Ji Liu

[1] https://github.com/apache/arrow/pull/4960
[2] https://arrow.apache.org/docs/ipc.html
[3] https://issues.apache.org/jira/browse/ARROW-1875
[4] https://issues.apache.org/jira/browse/ARROW-6202[5] 
https://issues.apache.org/jira/browse/ARROW-6219



------------------------------------------------------------------
From:Wes McKinney <[email protected]>
Send Time:2019年8月19日(星期一) 23:03
To:dev <[email protected]>
Subject:Re: Timeline for 0.15.0 release

I'm going to work some on organizing the 0.15.0 backlog some this
week, if anyone wants to help with grooming (particularly for
languages other than C++/Python where I'm focusing) that would be
helpful. There have been almost 500 JIRA issues opened since the
0.14.0 release, so we should make sure to check whether there's any
regressions or other serious bugs that we should try to fix for
0.15.0.

On Thu, Aug 15, 2019 at 6:23 PM Wes McKinney <[email protected]> wrote:
>
> The Windows wheel issue in 0.14.1 seems to be
>
> https://issues.apache.org/jira/browse/ARROW-6015
>
> I think the root cause could be the Windows changes in
>
> https://github.com/apache/arrow/commit/223ae744cc2a12c60cecb5db593263a03c13f85a
>
> I would be appreciative if a volunteer would look into what was wrong
> with the 0.14.1 wheels on Windows. Otherwise 0.15.0 Windows wheels
> will be broken, too
>
> The bad wheels can be found at
>
> https://bintray.com/apache/arrow/python#files/python%2F0.14.1
>
> On Thu, Aug 15, 2019 at 1:28 PM Antoine Pitrou <[email protected]> wrote:
> >
> > On Thu, 15 Aug 2019 11:17:07 -0700
> > Micah Kornfield <[email protected]> wrote:
> > > >
> > > > In C++ they are
> > > > independent, we could have 32-bit array lengths and variable-length
> > > > types with 64-bit offsets if we wanted (we just wouldn't be able to
> > > > have a List child with more than INT32_MAX elements).
> > >
> > > I think the point is we could do this in C++ but we don't.  I'm not sure 
> > > we
> > > would have introduced the "Large" types if we did.
> >
> > 64-bit offsets take twice as much space as 32-bit offsets, so if you're
> > storing lots of small-ish lists or strings, 32-bit offsets are
> > preferrable.  So even with 64-bit array lengths from the start it would
> > still be beneficial to have types with 32-bit offsets.
> >
> > > Going with the limited address space in Java and calling it a reference
> > > implementation seems suboptimal. If a consumer uses a "Large" type
> > > presumably it is because they need the ability to store more than 
> > > INT32_MAX
> > > child elements in a column, otherwise it is just wasting space [1].
> >
> > Probably. Though if the individual elements (lists or strings) are
> > large, not much space is wasted in proportion, so it may be simpler in
> > such a case to always create a "Large" type array.
> >
> > > [1] I suppose theoretically there might be some performance benefits on
> > > 64-bit architectures to using the native word sizes.
> >
> > Concretely, common 64-bit architectures don't do that, as 32-bit is an
> > extremely common integer size even in high-performance code.
> >
> > Regards
> >
> > Antoine.
> >
> >

Re: Timeline for 0.15.0 release

Reply via email to