[
https://issues.apache.org/jira/browse/ARROW-17280?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Corey Kosak updated ARROW-17280:
--------------------------------
Description:
When a user's C++ program links to both Arrow and an installation of the
Flatbuffers library, the program can crash or send corrupt Arrow messages.
The reason for this is version incompatibility between the vendored (and
trimmed-down) version of Flatbuffers that lives inside Arrow, and whatever
version of Flatbuffers the user is using.
The community seems to be aware of this issue, at least as it impacts Java:
ARROW-5579
In C++, the problem is especially pernicious because it is not even diagnosed
at build time (e.g. by duplicate linker symbols). The methods being used are
templates and so their definitions are emitted as weak symbols by the compiler.
As we all know, when a weak symbol is defined in two different compilation
units, the linker assumes their definitions are identical and it will just pick
one. Here, the result is that either Arrow or the user program gets different
Flatbuffers code than what it expected, and the program crashes.
Arrow doesn't even advertise the version of Flatbuffers that it vendored so
it's impossible for the user to even ameliorate this problem. In any case, it
would be a little unfriendly to force the user to use that exact version of
Flatbuffers even if it could be identified.
The good news is that there is an easy workaround. Arrow C++ doesn't export
Flatbuffers as part of its public interface. Instead, it just uses it
internally, as an implementation detail. Therefore it is easy to just move the
vendored Flatbuffers from the namespace "flatbuffers" to some other private
namespace. In my PR, I change the namespace to arrow_thirdparty_flatbuffers.
Then I create a namespace alias which makes flatbuffers an alias for
arrow_thirdparty_flatbuffers. The net result is that (thanks to the new
namespace) the symbols exported by the linker are in the "private" namespace
arrow_thirdparty_flatbuffers, and therefore don't conflict with any other
flatbuffers, but (thanks to the alias) the calling code in the rest of the
Arrow library doesn't have to change at all.
You might prefer a nested namespace instead, such as
arrow::thirdparty::flatbuffers, or some other choice.
was:
When a user's C++ program links to both Arrow and an installation of the
Flatbuffers library, the program can crash or send corrupt Arrow messages.
The reason for this is version incompatibility between the vendored (and
trimmed-down) version of Flatbuffers that lives inside Arrow, and whatever
version the user is using.
The community seems to be aware of this issue, at least as it impacts Java:
ARROW-5579
In C++, the problem is especially pernicious because it is not even diagnosed
at build time (e.g. by duplicate linker symbols). The methods being used are
templates and so their definitions are emitted as weak symbols by the compiler.
As we all know, when a weak symbol is defined in two different compilation
units, the linker assumes their definitions are identical and it will just pick
one. Here, the result is that either Arrow or the user program gets different
Flatbuffers code than what it expected, and the program crashes.
Arrow doesn't even advertise the version of Flatbuffers that it vendored so
it's impossible for the user to even ameliorate this problem. In any case, it
would be a little unfriendly to force the user to use that exact version of
Flatbuffers even if it could be identified.
The good news is that there is an easy workaround. Arrow C++ doesn't export
Flatbuffers as part of its public interface. Instead, it just uses it
internally, as an implementation detail. Therefore it is easy to just move the
vendored Flatbuffers from the namespace "flatbuffers" to some other private
namespace. In my PR, I change the namespace to arrow_thirdparty_flatbuffer.
Then I create a namespace alias which makes flatbuffers an alias for
arrow_thirdparty_flatbuffers. The net result is that (thanks to the new
namespace) the symbols exported by the linker are in the "private" namespace
arrow_thirdparty_flatbuffers, and therefore don't conflict with any other
flatbuffers, but (thanks to the alias) the calling code in the rest of the
Arrow library doesn't have to change at all.
You might prefer a nested namespace instead, such as
arrow::thirdparty::flatbuffers, or some other choice.
> Move vendored flatbuffers to private namespace
> ----------------------------------------------
>
> Key: ARROW-17280
> URL: https://issues.apache.org/jira/browse/ARROW-17280
> Project: Apache Arrow
> Issue Type: Improvement
> Components: C++
> Affects Versions: 5.0.0, 6.0.2, 7.0.1, 8.0.1
> Reporter: Corey Kosak
> Priority: Major
> Labels: pull-request-available
> Time Spent: 20m
> Remaining Estimate: 0h
>
> When a user's C++ program links to both Arrow and an installation of the
> Flatbuffers library, the program can crash or send corrupt Arrow messages.
> The reason for this is version incompatibility between the vendored (and
> trimmed-down) version of Flatbuffers that lives inside Arrow, and whatever
> version of Flatbuffers the user is using.
> The community seems to be aware of this issue, at least as it impacts Java:
> ARROW-5579
> In C++, the problem is especially pernicious because it is not even diagnosed
> at build time (e.g. by duplicate linker symbols). The methods being used are
> templates and so their definitions are emitted as weak symbols by the
> compiler. As we all know, when a weak symbol is defined in two different
> compilation units, the linker assumes their definitions are identical and it
> will just pick one. Here, the result is that either Arrow or the user program
> gets different Flatbuffers code than what it expected, and the program
> crashes.
> Arrow doesn't even advertise the version of Flatbuffers that it vendored so
> it's impossible for the user to even ameliorate this problem. In any case, it
> would be a little unfriendly to force the user to use that exact version of
> Flatbuffers even if it could be identified.
> The good news is that there is an easy workaround. Arrow C++ doesn't export
> Flatbuffers as part of its public interface. Instead, it just uses it
> internally, as an implementation detail. Therefore it is easy to just move
> the vendored Flatbuffers from the namespace "flatbuffers" to some other
> private namespace. In my PR, I change the namespace to
> arrow_thirdparty_flatbuffers. Then I create a namespace alias which makes
> flatbuffers an alias for arrow_thirdparty_flatbuffers. The net result is that
> (thanks to the new namespace) the symbols exported by the linker are in the
> "private" namespace arrow_thirdparty_flatbuffers, and therefore don't
> conflict with any other flatbuffers, but (thanks to the alias) the calling
> code in the rest of the Arrow library doesn't have to change at all.
> You might prefer a nested namespace instead, such as
> arrow::thirdparty::flatbuffers, or some other choice.
--
This message was sent by Atlassian Jira
(v8.20.10#820010)