+1 on a 0.15.0 release. At the minimum, if we could detect the stream and provide a clear error message for Python and Java I think that would help the transition. If we are also able to implement readers/writers that can fallback to 4-byte prefix, then that would be nice to have.
On Wed, Jul 24, 2019 at 1:27 PM Jacques Nadeau <jacq...@apache.org> wrote: > I'm ok with the change and 0.15 release to better manage it. > > > > I've always understood the metadata to be a few dozen/hundred KB, a > > small percentage of the total message size. I could be underestimating > > the ratios though -- is it common to have tables w/ 1000+ columns? I've > > seen a few reports like that in cuDF, but I'm curious to hear > > Jacques'/Dremio's experience too. > > > > Metadata size has been an issue at different points for us. We do > definitely see datasets with 1000+ columns. It is also compounded by the > fact that as we add more columns, we typically decrease row count so that > the individual batches are still easily pipelined--which further increases > the relative ratio between data and metadata. >