Yeah, I think the spec should be strict. And for convenience, I'd say it should probably be the padded length (though I don't have a strong opinion).
Regards Antoine. Le 03/10/2019 à 06:23, Micah Kornfield a écrit : > Hi Wes, > It seems fine to be flexible here. However: > > >> This could have implications for hashing or >> comparisons, for example, so I think that having the flexibility to do >> either is a good idea. > > This statement of use-cases makes me a little nervous. It seems like it > could lead to bugs if a consumer is reading from two producers that use > different alternatives? > > Thanks, > Micah > > On Mon, Sep 30, 2019 at 5:24 PM Wes McKinney <wesmck...@gmail.com> wrote: > >> I just updated my pull request from May adding language to clarify >> what protocol writers are expected to set when producing the Arrow >> binary protocol >> >> https://github.com/apache/arrow/pull/4370 >> >> Implementations may allocate small buffers, or use memory which does >> not meet the 8-byte minimal padding requirements of the Arrow >> protocol. It becomes a question, then, whether to set the in-memory >> buffer size or the padded size when producing the protocol. >> >> This PR states that either is acceptable. As an example, a 1-byte >> validity buffer could have Buffer metadata stating that the size >> either is 1 byte or 8 bytes. Either way, 7 bytes of padding must be >> written to conform to the protocol. The metadata, therefore, reflects >> the "intent" of the protocol writer for the protocol reader. If the >> writer says the length is 1, then the protocol reader understands that >> the writer does not expect the reader to concern itself with the 7 >> bytes of padding. This could have implications for hashing or >> comparisons, for example, so I think that having the flexibility to do >> either is a good idea. >> >> For an application that wants to guarantee that AVX512 instructions >> can be used on all buffers on the receiver side, it would be >> appropriate to include 512-bit padding in the accounting. >> >> Let me know if others think differently so we can have this properly >> documented for the 1.0.0 Format release. >> >> Thanks, >> Wes >> >