hi Eric -- yes, that's correct. I'm planning to amend the Format docs today regarding the EOS issue and also update the C++ library
On Wed, Sep 11, 2019 at 11:21 AM Eric Erhardt <eric.erha...@microsoft.com> wrote: > > I assume the plan is to merge the ARROW-6313-flatbuffer-alignment branch into > master before the 0.15 release, correct? > > BTW - I believe the C# alignment changes are ready to be merged into the > alignment branch - https://github.com/apache/arrow/pull/5280/ > > Eric > > -----Original Message----- > From: Micah Kornfield <emkornfi...@gmail.com> > Sent: Tuesday, September 10, 2019 10:24 PM > To: Wes McKinney <wesmck...@gmail.com> > Cc: dev <dev@arrow.apache.org>; niki.lj <niki...@aliyun.com> > Subject: Re: Timeline for 0.15.0 release > > I should have a little more bandwidth to help with some of the packaging > starting tomorrow and going into the weekend. > > On Tuesday, September 10, 2019, Wes McKinney <wesmck...@gmail.com> wrote: > > > Hi folks, > > > > With the state of nightly packaging and integration builds things > > aren't looking too good for being in release readiness by the end of > > this week but maybe I'm wrong. I'm planning to be working to close as > > many issues as I can and also to help with the ongoing alignment fixes. > > > > Wes > > > > On Thu, Sep 5, 2019, 11:07 PM Micah Kornfield <emkornfi...@gmail.com> > > wrote: > > > >> Just for reference [1] has a dashboard of the current issues: > >> > >> https://nam06.safelinks.protection.outlook.com/?url=https%3A%2F%2Fcwi > >> ki.apache.org%2Fconfluence%2Fdisplay%2FARROW%2FArrow%2B0.15.0%2BRelea > >> se&data=02%7C01%7CEric.Erhardt%40microsoft.com%7Ccbead81a42104034 > >> a4f308d736678a45%7C72f988bf86f141af91ab2d7cd011db47%7C1%7C0%7C6370376 > >> 90648216338&sdata=0Upux3i%2B9X6f8uanGKSGM5VYxR6c2ADWrxSPi1%2FgbH4 > >> %3D&reserved=0 > >> > >> On Thu, Sep 5, 2019 at 3:43 PM Wes McKinney <wesmck...@gmail.com> wrote: > >> > >>> hi all, > >>> > >>> It doesn't seem like we're going to be in a position to release at > >>> the beginning of next week. I hope that one more week of work (or > >>> less) will be enough to get us there. Aside from merging the > >>> alignment changes, we need to make sure that our packaging jobs > >>> required for the release candidate are all working. > >>> > >>> If folks could remove issues from the 0.15.0 backlog that they don't > >>> think they will finish by end of next week that would help focus > >>> efforts (there are currently 78 issues in 0.15.0 still). I am > >>> looking to tackle a few small features related to dictionaries while > >>> the release window is still open. > >>> > >>> - Wes > >>> > >>> On Tue, Aug 27, 2019 at 3:48 PM Wes McKinney <wesmck...@gmail.com> > >>> wrote: > >>> > > >>> > hi, > >>> > > >>> > I think we should try to release the week of September 9, so > >>> > development work should be completed by end of next week. > >>> > > >>> > Does that seem reasonable? > >>> > > >>> > I plan to get up a patch for the protocol alignment changes for > >>> > C++ in the next couple of days -- I think that getting the > >>> > alignment work done is the main barrier to releasing. > >>> > > >>> > Thanks > >>> > Wes > >>> > > >>> > On Mon, Aug 19, 2019 at 12:25 PM Ji Liu > >>> > <niki...@aliyun.com.invalid> > >>> wrote: > >>> > > > >>> > > Hi, Wes, on the java side, I can think of several bugs that need > >>> > > to > >>> be fixed or reminded. > >>> > > > >>> > > i. ARROW-6040: Dictionary entries are required in IPC streams > >>> > > even > >>> when empty[1] > >>> > > This one is under review now, however through this PR we find > >>> > > that > >>> there seems a bug in java reading and writing dictionaries in IPC > >>> which is Inconsistent with spec[2] since it assumes all dictionaries > >>> are at the start of stream (see details in PR comments, and this > >>> fix may not catch up with version 0.15). @Micah Kornfield > >>> > > > >>> > > ii. ARROW-1875: Write 64-bit ints as strings in integration test > >>> JSON files[3] > >>> > > Java side code already checked in, other implementations seems not. > >>> > > > >>> > > iii. ARROW-6202: OutOfMemory in JdbcAdapter[4] Caused by trying > >>> > > to load all records in one contiguous batch, fixed > >>> by providing iterator API for iteratively reading in ARROW-6219[5]. > >>> > > > >>> > > Thanks, > >>> > > Ji Liu > >>> > > > >>> > > [1] > >>> > > https://nam06.safelinks.protection.outlook.com/?url=https%3A%2F% > >>> > > 2Fgithub.com%2Fapache%2Farrow%2Fpull%2F4960&data=02%7C01%7CE > >>> > > ric.Erhardt%40microsoft.com%7Ccbead81a42104034a4f308d736678a45%7 > >>> > > C72f988bf86f141af91ab2d7cd011db47%7C1%7C0%7C637037690648216338&a > >>> > > mp;sdata=eDF%2FAsJmVs7WjfEuNBYo%2F1TypIN44xx1TTlK6kQHZVg%3D& > >>> > > reserved=0 [2] > >>> > > https://nam06.safelinks.protection.outlook.com/?url=https%3A%2F% > >>> > > 2Farrow.apache.org%2Fdocs%2Fipc.html&data=02%7C01%7CEric.Erh > >>> > > ardt%40microsoft.com%7Ccbead81a42104034a4f308d736678a45%7C72f988 > >>> > > bf86f141af91ab2d7cd011db47%7C1%7C0%7C637037690648216338&sdat > >>> > > a=H0pM8bVKsOyeORDhHxLlS%2BpaS%2F5meT52wxTKmNssuMk%3D&reserve > >>> > > d=0 [3] > >>> > > https://nam06.safelinks.protection.outlook.com/?url=https%3A%2F% > >>> > > 2Fissues.apache.org%2Fjira%2Fbrowse%2FARROW-1875&data=02%7C0 > >>> > > 1%7CEric.Erhardt%40microsoft.com%7Ccbead81a42104034a4f308d736678 > >>> > > a45%7C72f988bf86f141af91ab2d7cd011db47%7C1%7C0%7C637037690648216 > >>> > > 338&sdata=coTpuoEGhfjyOSBTagdlohOTX24DQZmtbWC0gYsDmkM%3D& > >>> > > ;reserved=0 [4] > >>> > > https://nam06.safelinks.protection.outlook.com/?url=https%3A%2F% > >>> > > 2Fissues.apache.org%2Fjira%2Fbrowse%2FARROW-6202%5B5&data=02 > >>> > > %7C01%7CEric.Erhardt%40microsoft.com%7Ccbead81a42104034a4f308d73 > >>> > > 6678a45%7C72f988bf86f141af91ab2d7cd011db47%7C1%7C0%7C63703769064 > >>> > > 8216338&sdata=gnyUMk8cUgwc802QBLF3eAp3mznYwonlbF0qmGyzgmY%3D > >>> > > &reserved=0] > >>> https://nam06.safelinks.protection.outlook.com/?url=https%3A%2F%2Fis > >>> sues.apache.org%2Fjira%2Fbrowse%2FARROW-6219&data=02%7C01%7CEric > >>> .Erhardt%40microsoft.com%7Ccbead81a42104034a4f308d736678a45%7C72f988 > >>> bf86f141af91ab2d7cd011db47%7C1%7C0%7C637037690648216338&sdata=d3 > >>> LF%2BTeWSprASqO%2ByE4LywlsULHGcb1Iq%2F2byHrEPkY%3D&reserved=0 > >>> > > > >>> > > > >>> > > > >>> > > ---------------------------------------------------------------- > >>> > > -- From:Wes McKinney <wesmck...@gmail.com> Send > >>> > > Time:2019年8月19日(星期一) 23:03 To:dev <dev@arrow.apache.org> > >>> > > Subject:Re: Timeline for 0.15.0 release > >>> > > > >>> > > I'm going to work some on organizing the 0.15.0 backlog some > >>> > > this week, if anyone wants to help with grooming (particularly > >>> > > for languages other than C++/Python where I'm focusing) that > >>> > > would be helpful. There have been almost 500 JIRA issues opened > >>> > > since the > >>> > > 0.14.0 release, so we should make sure to check whether there's > >>> > > any regressions or other serious bugs that we should try to fix > >>> > > for 0.15.0. > >>> > > > >>> > > On Thu, Aug 15, 2019 at 6:23 PM Wes McKinney > >>> > > <wesmck...@gmail.com> > >>> wrote: > >>> > > > > >>> > > > The Windows wheel issue in 0.14.1 seems to be > >>> > > > > >>> > > > https://nam06.safelinks.protection.outlook.com/?url=https%3A%2 > >>> > > > F%2Fissues.apache.org%2Fjira%2Fbrowse%2FARROW-6015&data=02 > >>> > > > %7C01%7CEric.Erhardt%40microsoft.com%7Ccbead81a42104034a4f308d > >>> > > > 736678a45%7C72f988bf86f141af91ab2d7cd011db47%7C1%7C0%7C6370376 > >>> > > > 90648216338&sdata=D9lqHR16oRAFlPaIrcXq3UtW%2BLuJQW1u0Gom2u > >>> > > > WEWg0%3D&reserved=0 > >>> > > > > >>> > > > I think the root cause could be the Windows changes in > >>> > > > > >>> > > > https://nam06.safelinks.protection.outlook.com/?url=https%3A%2 > >>> > > > F%2Fgithub.com%2Fapache%2Farrow%2Fcommit%2F&data=02%7C01%7 > >>> > > > CEric.Erhardt%40microsoft.com%7Ccbead81a42104034a4f308d736678a > >>> > > > 45%7C72f988bf86f141af91ab2d7cd011db47%7C1%7C0%7C63703769064821 > >>> > > > 6338&sdata=iPmFB%2BncIbmvp5D31vjB4A2KyuMP%2B83Vp7%2BDiOxvl > >>> > > > bs%3D&reserved=0 > >>> 223ae744cc2a12c60cecb5db593263a03c13f85a > >>> > > > > >>> > > > I would be appreciative if a volunteer would look into what > >>> > > > was > >>> wrong > >>> > > > with the 0.14.1 wheels on Windows. Otherwise 0.15.0 Windows > >>> > > > wheels will be broken, too > >>> > > > > >>> > > > The bad wheels can be found at > >>> > > > > >>> > > > https://nam06.safelinks.protection.outlook.com/?url=https%3A%2 > >>> > > > F%2Fbintray.com%2Fapache%2Farrow%2Fpython%23files%2Fpython%252 > >>> > > > F0.14.1&data=02%7C01%7CEric.Erhardt%40microsoft.com%7Ccbea > >>> > > > d81a42104034a4f308d736678a45%7C72f988bf86f141af91ab2d7cd011db4 > >>> > > > 7%7C1%7C0%7C637037690648216338&sdata=vZzx4HNS9qp2UWhFagqfJ > >>> > > > zbY%2BGzwspH1TO3wdfrbA6Y%3D&reserved=0 > >>> > > > > >>> > > > On Thu, Aug 15, 2019 at 1:28 PM Antoine Pitrou < > >>> solip...@pitrou.net> wrote: > >>> > > > > > >>> > > > > On Thu, 15 Aug 2019 11:17:07 -0700 Micah Kornfield > >>> > > > > <emkornfi...@gmail.com> wrote: > >>> > > > > > > > >>> > > > > > > In C++ they are > >>> > > > > > > independent, we could have 32-bit array lengths and > >>> variable-length > >>> > > > > > > types with 64-bit offsets if we wanted (we just wouldn't > >>> > > > > > > be > >>> able to > >>> > > > > > > have a List child with more than INT32_MAX elements). > >>> > > > > > > >>> > > > > > I think the point is we could do this in C++ but we don't. > >>> I'm not sure we > >>> > > > > > would have introduced the "Large" types if we did. > >>> > > > > > >>> > > > > 64-bit offsets take twice as much space as 32-bit offsets, > >>> > > > > so if > >>> you're > >>> > > > > storing lots of small-ish lists or strings, 32-bit offsets > >>> > > > > are preferrable. So even with 64-bit array lengths from the > >>> > > > > start > >>> it would > >>> > > > > still be beneficial to have types with 32-bit offsets. > >>> > > > > > >>> > > > > > Going with the limited address space in Java and calling > >>> > > > > > it a > >>> reference > >>> > > > > > implementation seems suboptimal. If a consumer uses a "Large" > >>> type > >>> > > > > > presumably it is because they need the ability to store > >>> > > > > > more > >>> than INT32_MAX > >>> > > > > > child elements in a column, otherwise it is just wasting > >>> > > > > > space > >>> [1]. > >>> > > > > > >>> > > > > Probably. Though if the individual elements (lists or > >>> > > > > strings) > >>> are > >>> > > > > large, not much space is wasted in proportion, so it may be > >>> simpler in > >>> > > > > such a case to always create a "Large" type array. > >>> > > > > > >>> > > > > > [1] I suppose theoretically there might be some > >>> > > > > > performance > >>> benefits on > >>> > > > > > 64-bit architectures to using the native word sizes. > >>> > > > > > >>> > > > > Concretely, common 64-bit architectures don't do that, as > >>> > > > > 32-bit > >>> is an > >>> > > > > extremely common integer size even in high-performance code. > >>> > > > > > >>> > > > > Regards > >>> > > > > > >>> > > > > Antoine. > >>> > > > > > >>> > > > > > >>> > > > >>> > >>