>
> To be clear, I agree that we need to check that our various validation
> and integration suites pass properly.  But once that is done and
> assuming all the metadata variations are properly tested, data
> variations should not pose any problem.
>

Unless I'm misunderstanding your proposal, that doesn't deal with the data
that has already been produced that may have been written in a way that
this change finds non-consumable but works today. By doing things at the
format level, there is no way for flatbuf to parse data that doesn't comply.


> The write side is irrelevant here, since the concern is to protect
> reliably against invalid input (especially due to malicious intent).
>

Not really.  If this had been enforced on the write side since day 1,
enforcing on the read side now would be a noop. If we started enforcing
this on the write side today across all languages then it would make it
more feasible to incorporate into the read side six months or a year from
now (as data ages out). I don't know about others but our use of persisted
Arrow flatbuf serialization is primarily focused on fairly short shelf-life
datasets (months more than years).


> Of course, we can hand-write all the NULL checks on the read side.  My
> concern is not the one-time cost of doing so, but the long-term
> fragility of such a strategy


I agree with you in principle about using tools rather than humans to
minimize mistakes. On the flipside, we chose to use optional for the same
reason that flatbuf defaults to optional, protobuf2 recommended use of
optional over required and protobuf3 removed the ability to express things
as required [1].


> (every refactor or format addition is a
> threat to the robustness of the IPC reader).


Any format additions can be implemented however we want (required,
optional, etc) so I don't see that as related to the issue at hand.


> I don't think a potential
> long-standing history of security issues in Arrow would help adoption.


This is a strawman argument. I also think we should avoid having a
long-standing history of security issues.

[1] https://github.com/protocolbuffers/protobuf/issues/2497

Reply via email to