[ https://issues.apache.org/jira/browse/ARROW-8970?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Wes McKinney updated ARROW-8970: -------------------------------- Description: We're reaching a point where we may need to be careful about decisions that increase code size: * Instantiating too many templates for code that isn't performance sensitive, or where some templates may do the same thing (e.g. Int32Type kernels may do the same thing as a Date32Type kernel) * Inlining functions that don't need to be inline Code size tends to correlate also with compilation times, but not always. I'll use this umbrella issue to organize issues related to reducing compiled code size At this moment (2020-05-27), here are the 25 largest object files in a -O2 build {code} 524896 src/arrow/CMakeFiles/arrow_objlib.dir/array/builder_dict.cc.o 531920 src/arrow/CMakeFiles/arrow_objlib.dir/filesystem/s3fs.cc.o 552000 src/arrow/CMakeFiles/arrow_objlib.dir/json/converter.cc.o 575920 src/arrow/CMakeFiles/arrow_objlib.dir/csv/converter.cc.o 595112 src/arrow/CMakeFiles/arrow_objlib.dir/compute/kernels/scalar_cast_string.cc.o 645728 src/arrow/CMakeFiles/arrow_objlib.dir/type.cc.o 683040 src/arrow/CMakeFiles/arrow_objlib.dir/compute/kernels/scalar_set_lookup.cc.o 702232 src/arrow/CMakeFiles/arrow_objlib.dir/ipc/reader.cc.o 729912 src/arrow/CMakeFiles/arrow_objlib.dir/tensor/coo_converter.cc.o 752776 src/arrow/CMakeFiles/arrow_objlib.dir/tensor/csc_converter.cc.o 752776 src/arrow/CMakeFiles/arrow_objlib.dir/tensor/csr_converter.cc.o 877680 src/arrow/CMakeFiles/arrow_objlib.dir/array/dict_internal.cc.o 885624 src/arrow/CMakeFiles/arrow_objlib.dir/builder.cc.o 919072 src/arrow/CMakeFiles/arrow_objlib.dir/scalar.cc.o 941776 src/arrow/CMakeFiles/arrow_objlib.dir/ipc/json_internal.cc.o 1055248 src/arrow/CMakeFiles/arrow_objlib.dir/ipc/json_simple.cc.o 1233304 src/arrow/CMakeFiles/arrow_objlib.dir/compute/kernels/scalar_compare.cc.o 1265160 src/arrow/CMakeFiles/arrow_objlib.dir/sparse_tensor.cc.o 1343480 src/arrow/CMakeFiles/arrow_objlib.dir/tensor/csf_converter.cc.o 1346928 src/arrow/CMakeFiles/arrow_objlib.dir/array.cc.o 1502568 src/arrow/CMakeFiles/arrow_objlib.dir/compute/kernels/vector_hash.cc.o 1609760 src/arrow/CMakeFiles/arrow_objlib.dir/compute/kernels/scalar_cast_numeric.cc.o 1794416 src/arrow/CMakeFiles/arrow_objlib.dir/array/diff.cc.o 2759552 src/arrow/CMakeFiles/arrow_objlib.dir/compute/kernels/vector_filter.cc.o 7609432 src/arrow/CMakeFiles/arrow_objlib.dir/compute/kernels/vector_take.cc.o {code} was: We're reaching a point where we may need to be careful about decisions that increase code size: * Instantiating too many templates for code that isn't performance sensitive * Inlining functions that don't need to be inline Code size tends to correlate also with compilation times, but not always. I'll use this umbrella issue to organize issues related to reducing compiled code size At this moment (2020-05-27), here are the 25 largest object files in a -O2 build {code} 524896 src/arrow/CMakeFiles/arrow_objlib.dir/array/builder_dict.cc.o 531920 src/arrow/CMakeFiles/arrow_objlib.dir/filesystem/s3fs.cc.o 552000 src/arrow/CMakeFiles/arrow_objlib.dir/json/converter.cc.o 575920 src/arrow/CMakeFiles/arrow_objlib.dir/csv/converter.cc.o 595112 src/arrow/CMakeFiles/arrow_objlib.dir/compute/kernels/scalar_cast_string.cc.o 645728 src/arrow/CMakeFiles/arrow_objlib.dir/type.cc.o 683040 src/arrow/CMakeFiles/arrow_objlib.dir/compute/kernels/scalar_set_lookup.cc.o 702232 src/arrow/CMakeFiles/arrow_objlib.dir/ipc/reader.cc.o 729912 src/arrow/CMakeFiles/arrow_objlib.dir/tensor/coo_converter.cc.o 752776 src/arrow/CMakeFiles/arrow_objlib.dir/tensor/csc_converter.cc.o 752776 src/arrow/CMakeFiles/arrow_objlib.dir/tensor/csr_converter.cc.o 877680 src/arrow/CMakeFiles/arrow_objlib.dir/array/dict_internal.cc.o 885624 src/arrow/CMakeFiles/arrow_objlib.dir/builder.cc.o 919072 src/arrow/CMakeFiles/arrow_objlib.dir/scalar.cc.o 941776 src/arrow/CMakeFiles/arrow_objlib.dir/ipc/json_internal.cc.o 1055248 src/arrow/CMakeFiles/arrow_objlib.dir/ipc/json_simple.cc.o 1233304 src/arrow/CMakeFiles/arrow_objlib.dir/compute/kernels/scalar_compare.cc.o 1265160 src/arrow/CMakeFiles/arrow_objlib.dir/sparse_tensor.cc.o 1343480 src/arrow/CMakeFiles/arrow_objlib.dir/tensor/csf_converter.cc.o 1346928 src/arrow/CMakeFiles/arrow_objlib.dir/array.cc.o 1502568 src/arrow/CMakeFiles/arrow_objlib.dir/compute/kernels/vector_hash.cc.o 1609760 src/arrow/CMakeFiles/arrow_objlib.dir/compute/kernels/scalar_cast_numeric.cc.o 1794416 src/arrow/CMakeFiles/arrow_objlib.dir/array/diff.cc.o 2759552 src/arrow/CMakeFiles/arrow_objlib.dir/compute/kernels/vector_filter.cc.o 7609432 src/arrow/CMakeFiles/arrow_objlib.dir/compute/kernels/vector_take.cc.o {code} > [C++] Reduce shared library code size (umbrella issue) > ------------------------------------------------------ > > Key: ARROW-8970 > URL: https://issues.apache.org/jira/browse/ARROW-8970 > Project: Apache Arrow > Issue Type: Improvement > Components: C++ > Reporter: Wes McKinney > Priority: Major > > We're reaching a point where we may need to be careful about decisions that > increase code size: > * Instantiating too many templates for code that isn't performance sensitive, > or where some templates may do the same thing (e.g. Int32Type kernels may do > the same thing as a Date32Type kernel) > * Inlining functions that don't need to be inline > Code size tends to correlate also with compilation times, but not always. > I'll use this umbrella issue to organize issues related to reducing compiled > code size > At this moment (2020-05-27), here are the 25 largest object files in a -O2 > build > {code} > 524896 src/arrow/CMakeFiles/arrow_objlib.dir/array/builder_dict.cc.o > 531920 src/arrow/CMakeFiles/arrow_objlib.dir/filesystem/s3fs.cc.o > 552000 src/arrow/CMakeFiles/arrow_objlib.dir/json/converter.cc.o > 575920 src/arrow/CMakeFiles/arrow_objlib.dir/csv/converter.cc.o > 595112 > src/arrow/CMakeFiles/arrow_objlib.dir/compute/kernels/scalar_cast_string.cc.o > 645728 src/arrow/CMakeFiles/arrow_objlib.dir/type.cc.o > 683040 > src/arrow/CMakeFiles/arrow_objlib.dir/compute/kernels/scalar_set_lookup.cc.o > 702232 src/arrow/CMakeFiles/arrow_objlib.dir/ipc/reader.cc.o > 729912 src/arrow/CMakeFiles/arrow_objlib.dir/tensor/coo_converter.cc.o > 752776 src/arrow/CMakeFiles/arrow_objlib.dir/tensor/csc_converter.cc.o > 752776 src/arrow/CMakeFiles/arrow_objlib.dir/tensor/csr_converter.cc.o > 877680 src/arrow/CMakeFiles/arrow_objlib.dir/array/dict_internal.cc.o > 885624 src/arrow/CMakeFiles/arrow_objlib.dir/builder.cc.o > 919072 src/arrow/CMakeFiles/arrow_objlib.dir/scalar.cc.o > 941776 src/arrow/CMakeFiles/arrow_objlib.dir/ipc/json_internal.cc.o > 1055248 src/arrow/CMakeFiles/arrow_objlib.dir/ipc/json_simple.cc.o > 1233304 > src/arrow/CMakeFiles/arrow_objlib.dir/compute/kernels/scalar_compare.cc.o > 1265160 src/arrow/CMakeFiles/arrow_objlib.dir/sparse_tensor.cc.o > 1343480 src/arrow/CMakeFiles/arrow_objlib.dir/tensor/csf_converter.cc.o > 1346928 src/arrow/CMakeFiles/arrow_objlib.dir/array.cc.o > 1502568 > src/arrow/CMakeFiles/arrow_objlib.dir/compute/kernels/vector_hash.cc.o > 1609760 > src/arrow/CMakeFiles/arrow_objlib.dir/compute/kernels/scalar_cast_numeric.cc.o > 1794416 src/arrow/CMakeFiles/arrow_objlib.dir/array/diff.cc.o > 2759552 > src/arrow/CMakeFiles/arrow_objlib.dir/compute/kernels/vector_filter.cc.o > 7609432 > src/arrow/CMakeFiles/arrow_objlib.dir/compute/kernels/vector_take.cc.o > {code} -- This message was sent by Atlassian Jira (v8.3.4#803005)