FWIW, I lack historical context on how these methods evolved, so I'd appreciate insight from anyone who has worked on the java codebase for a longer period of time. The current situation seems less then ideal.
On Tue, Aug 4, 2020 at 12:55 AM Ji Liu <tianc...@apache.org> wrote: > Hi all, > > > When I worked on ARROW-7539[1], I met some problems and not sure what's the > proper way to solve it. > > > This issue was about to avoid set reader/writer indices in > FieldVector#getFieldBuffers according to the following reasons: > > i. getBuffers set reader/writer indices and it's right for the purpose of > sending the data over the wire > > ii. getFieldBuffers is distinct from getBuffers, it should be for getting > access to underlying data for higher-performance algorithms > > > Currently in VectorUnloader, we used getFieldBuffers to create > ArrowRecordBatch that's why we keep writer/reader indices in > getFieldBuffers > , we should use getBuffers instead. But during the change, we found another > problem: > > The order of validity and offset buffers are not in the same order in > ListVector(getBuffers's order is right), changing the API in VectorUnloader > creates problems with serialization/deserialization resulting in test > failures in Dremio which would break backward compatibility with existing > serialised files. > > > Micah gives a solution but seems doesn't reach consistent in the PR > thread[2 > ]: > > 1. Remove setReaderWriterIndeces in getFieldBuffers > 2. Deprecate getBuffers > 3. Introduce a new getIpcBuffers which is unambiguously used for writing > record batches (i.e. in VectorUnloader). > 4. Update documentation where it makes sense based on all this > conversation. > > > More details and discussions can be seen from the PR and hope to get more > feedback. > > > > Thanks, > > Ji Liu > > > > [1] https://issues.apache.org/jira/browse/ARROW-7539 > > [2] https://github.com/apache/arrow/pull/6156 >