ARROW-6837 (which, er, includes ARROW-6836) and ARROW-5916 have PRs. Would appreciate some feedback. I will finish the Python part of 6837 when I know I'm on the right track.
Thanks, John On Thu, Oct 10, 2019 at 9:54 AM John Muehlhausen <j...@jgm.org> wrote: > The format change is ARROW-6836 ... add a custom_metadata:[KeyValue] field > to the Footer table in File.fbs > > The other change (slicing a recordbatch to honor RecordBatch.length rather > than array length if the former is smaller) will hopefully not affect the > format. > > > On Wed, Oct 9, 2019 at 11:55 PM Wes McKinney <wesmck...@gmail.com> wrote: > >> Hi John, >> >> Since the 1.0.0 release is focused on Format stability, probably the >> only real "blockers" will be ensuring that we have hardened multiple >> implementations (in particular C++ and Java) of the columnar format as >> specified with integration tests to prove it. The issues you listed >> sound more like C++ library changes to me? >> >> If you want to propose Format-related changes, that would need to >> happen right away otherwise the ship will sail on that. >> >> - Wes >> >> On Wed, Oct 9, 2019 at 9:08 PM John Muehlhausen <j...@jgm.org> wrote: >> > >> > ARROW-5916 >> > ARROW-6836/6837 >> > >> > These are of particular interest to me because they enable recordbatch >> > "incrementalism" which is useful for streaming applications: >> > >> > ARROW-5916 allows a recordbatch to pre-allocate space for future records >> > that have not yet been populated, making it safe for readers to consume >> the >> > partial batch. >> > >> > ARROW-6836/6837 allows a file of record batches to be extended at the >> end, >> > without re-writing the beginning, while including the idea that the >> > custom_metadata may change with each update. (custom_metadata in the >> > Schema is not a good candidate because Schema also appears at the >> beginning >> > of the file.) >> > >> > While these are not blockers for me quite yet, they soon will be! If I >> > wanted to ensure that these are in 1.0, what is my deadline for >> > implementation and test cases? Can such a note be made on the wiki? >> > Should I change the priority in Jira? >> > >> > Thanks, >> > John >> > >> > On Wed, Oct 9, 2019 at 2:57 PM Neal Richardson < >> neal.p.richard...@gmail.com> >> > wrote: >> > >> > > Congratulations everyone on 0.15! I know a lot of hard work went into >> > > it, not only in the software itself but also in the build and release >> > > process. >> > > >> > > Once you've caught your breath from the release, we should start >> > > thinking about what's in scope for our next release, the big 1.0. To >> > > get us started (or restarted, since we did discuss 1.0 before the >> > > flatbuffer alignment issue came up), I've created >> > > https://cwiki.apache.org/confluence/display/ARROW/Arrow+1.0.0+Release >> > > based on our past release wiki pages. >> > > >> > > A good place to begin would be to list, either in "blocker" Jiras or >> > > bullet points on the document, the key features and tasks we must >> > > resolve before 1.0. For example, I get the sense that we need to >> > > overhaul the documentation, but that should be expressed in a more >> > > concrete, actionable way. >> > > >> > > Neal >> > > >> >