FWIW, I lack historical context on how these methods evolved, so I'd
appreciate insight from anyone who has worked on the java codebase for a
longer period of time.  The current situation seems less then ideal.

On Tue, Aug 4, 2020 at 12:55 AM Ji Liu <tianc...@apache.org> wrote:

> Hi all,
>
>
> When I worked on ARROW-7539[1], I met some problems and not sure what's the
> proper way to solve it.
>
>
> This issue was about to avoid set reader/writer indices in
> FieldVector#getFieldBuffers according to the following reasons:
>
> i. getBuffers set reader/writer indices and it's right for the purpose of
> sending the data over the wire
>
> ii. getFieldBuffers is distinct from getBuffers, it should be for getting
> access to underlying data for higher-performance algorithms
>
>
> Currently in VectorUnloader, we used getFieldBuffers to create
> ArrowRecordBatch that's why we keep writer/reader indices in
> getFieldBuffers
> , we should use getBuffers instead. But during the change, we found another
> problem:
>
> The order of validity and offset buffers are not in the same order in
> ListVector(getBuffers's order is right), changing the API in VectorUnloader
> creates problems with serialization/deserialization resulting in test
> failures in Dremio which would break backward compatibility with existing
> serialised files.
>
>
> Micah gives a solution but seems doesn't reach consistent in the PR
> thread[2
> ]:
>
>    1. Remove setReaderWriterIndeces in getFieldBuffers
>    2. Deprecate getBuffers
>    3. Introduce a new getIpcBuffers which is unambiguously used for writing
>    record batches (i.e. in VectorUnloader).
>    4. Update documentation where it makes sense based on all this
>    conversation.
>
>
> More details and discussions can be seen from the PR and hope to get more
> feedback.
>
>
>
> Thanks,
>
> Ji Liu
>
>
>
> [1] https://issues.apache.org/jira/browse/ARROW-7539
>
> [2] https://github.com/apache/arrow/pull/6156
>

Reply via email to