[
https://issues.apache.org/jira/browse/ARROW-8970?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Wes McKinney updated ARROW-8970:
--------------------------------
Description:
We're reaching a point where we may need to be careful about decisions that
increase code size:
* Instantiating too many templates for code that isn't performance sensitive,
or where some templates may do the same thing (e.g. Int32Type kernels may do
the same thing as a Date32Type kernel)
* Inlining functions that don't need to be inline
Code size tends to correlate also with compilation times, but not always.
I'll use this umbrella issue to organize issues related to reducing compiled
code size
At this moment (2020-05-27), here are the 25 largest object files in a -O2 build
{code}
524896 src/arrow/CMakeFiles/arrow_objlib.dir/array/builder_dict.cc.o
531920 src/arrow/CMakeFiles/arrow_objlib.dir/filesystem/s3fs.cc.o
552000 src/arrow/CMakeFiles/arrow_objlib.dir/json/converter.cc.o
575920 src/arrow/CMakeFiles/arrow_objlib.dir/csv/converter.cc.o
595112
src/arrow/CMakeFiles/arrow_objlib.dir/compute/kernels/scalar_cast_string.cc.o
645728 src/arrow/CMakeFiles/arrow_objlib.dir/type.cc.o
683040
src/arrow/CMakeFiles/arrow_objlib.dir/compute/kernels/scalar_set_lookup.cc.o
702232 src/arrow/CMakeFiles/arrow_objlib.dir/ipc/reader.cc.o
729912 src/arrow/CMakeFiles/arrow_objlib.dir/tensor/coo_converter.cc.o
752776 src/arrow/CMakeFiles/arrow_objlib.dir/tensor/csc_converter.cc.o
752776 src/arrow/CMakeFiles/arrow_objlib.dir/tensor/csr_converter.cc.o
877680 src/arrow/CMakeFiles/arrow_objlib.dir/array/dict_internal.cc.o
885624 src/arrow/CMakeFiles/arrow_objlib.dir/builder.cc.o
919072 src/arrow/CMakeFiles/arrow_objlib.dir/scalar.cc.o
941776 src/arrow/CMakeFiles/arrow_objlib.dir/ipc/json_internal.cc.o
1055248 src/arrow/CMakeFiles/arrow_objlib.dir/ipc/json_simple.cc.o
1233304
src/arrow/CMakeFiles/arrow_objlib.dir/compute/kernels/scalar_compare.cc.o
1265160 src/arrow/CMakeFiles/arrow_objlib.dir/sparse_tensor.cc.o
1343480 src/arrow/CMakeFiles/arrow_objlib.dir/tensor/csf_converter.cc.o
1346928 src/arrow/CMakeFiles/arrow_objlib.dir/array.cc.o
1502568 src/arrow/CMakeFiles/arrow_objlib.dir/compute/kernels/vector_hash.cc.o
1609760
src/arrow/CMakeFiles/arrow_objlib.dir/compute/kernels/scalar_cast_numeric.cc.o
1794416 src/arrow/CMakeFiles/arrow_objlib.dir/array/diff.cc.o
2759552 src/arrow/CMakeFiles/arrow_objlib.dir/compute/kernels/vector_filter.cc.o
7609432 src/arrow/CMakeFiles/arrow_objlib.dir/compute/kernels/vector_take.cc.o
{code}
was:
We're reaching a point where we may need to be careful about decisions that
increase code size:
* Instantiating too many templates for code that isn't performance sensitive
* Inlining functions that don't need to be inline
Code size tends to correlate also with compilation times, but not always.
I'll use this umbrella issue to organize issues related to reducing compiled
code size
At this moment (2020-05-27), here are the 25 largest object files in a -O2 build
{code}
524896 src/arrow/CMakeFiles/arrow_objlib.dir/array/builder_dict.cc.o
531920 src/arrow/CMakeFiles/arrow_objlib.dir/filesystem/s3fs.cc.o
552000 src/arrow/CMakeFiles/arrow_objlib.dir/json/converter.cc.o
575920 src/arrow/CMakeFiles/arrow_objlib.dir/csv/converter.cc.o
595112
src/arrow/CMakeFiles/arrow_objlib.dir/compute/kernels/scalar_cast_string.cc.o
645728 src/arrow/CMakeFiles/arrow_objlib.dir/type.cc.o
683040
src/arrow/CMakeFiles/arrow_objlib.dir/compute/kernels/scalar_set_lookup.cc.o
702232 src/arrow/CMakeFiles/arrow_objlib.dir/ipc/reader.cc.o
729912 src/arrow/CMakeFiles/arrow_objlib.dir/tensor/coo_converter.cc.o
752776 src/arrow/CMakeFiles/arrow_objlib.dir/tensor/csc_converter.cc.o
752776 src/arrow/CMakeFiles/arrow_objlib.dir/tensor/csr_converter.cc.o
877680 src/arrow/CMakeFiles/arrow_objlib.dir/array/dict_internal.cc.o
885624 src/arrow/CMakeFiles/arrow_objlib.dir/builder.cc.o
919072 src/arrow/CMakeFiles/arrow_objlib.dir/scalar.cc.o
941776 src/arrow/CMakeFiles/arrow_objlib.dir/ipc/json_internal.cc.o
1055248 src/arrow/CMakeFiles/arrow_objlib.dir/ipc/json_simple.cc.o
1233304
src/arrow/CMakeFiles/arrow_objlib.dir/compute/kernels/scalar_compare.cc.o
1265160 src/arrow/CMakeFiles/arrow_objlib.dir/sparse_tensor.cc.o
1343480 src/arrow/CMakeFiles/arrow_objlib.dir/tensor/csf_converter.cc.o
1346928 src/arrow/CMakeFiles/arrow_objlib.dir/array.cc.o
1502568 src/arrow/CMakeFiles/arrow_objlib.dir/compute/kernels/vector_hash.cc.o
1609760
src/arrow/CMakeFiles/arrow_objlib.dir/compute/kernels/scalar_cast_numeric.cc.o
1794416 src/arrow/CMakeFiles/arrow_objlib.dir/array/diff.cc.o
2759552 src/arrow/CMakeFiles/arrow_objlib.dir/compute/kernels/vector_filter.cc.o
7609432 src/arrow/CMakeFiles/arrow_objlib.dir/compute/kernels/vector_take.cc.o
{code}
> [C++] Reduce shared library code size (umbrella issue)
> ------------------------------------------------------
>
> Key: ARROW-8970
> URL: https://issues.apache.org/jira/browse/ARROW-8970
> Project: Apache Arrow
> Issue Type: Improvement
> Components: C++
> Reporter: Wes McKinney
> Priority: Major
>
> We're reaching a point where we may need to be careful about decisions that
> increase code size:
> * Instantiating too many templates for code that isn't performance sensitive,
> or where some templates may do the same thing (e.g. Int32Type kernels may do
> the same thing as a Date32Type kernel)
> * Inlining functions that don't need to be inline
> Code size tends to correlate also with compilation times, but not always.
> I'll use this umbrella issue to organize issues related to reducing compiled
> code size
> At this moment (2020-05-27), here are the 25 largest object files in a -O2
> build
> {code}
> 524896 src/arrow/CMakeFiles/arrow_objlib.dir/array/builder_dict.cc.o
> 531920 src/arrow/CMakeFiles/arrow_objlib.dir/filesystem/s3fs.cc.o
> 552000 src/arrow/CMakeFiles/arrow_objlib.dir/json/converter.cc.o
> 575920 src/arrow/CMakeFiles/arrow_objlib.dir/csv/converter.cc.o
> 595112
> src/arrow/CMakeFiles/arrow_objlib.dir/compute/kernels/scalar_cast_string.cc.o
> 645728 src/arrow/CMakeFiles/arrow_objlib.dir/type.cc.o
> 683040
> src/arrow/CMakeFiles/arrow_objlib.dir/compute/kernels/scalar_set_lookup.cc.o
> 702232 src/arrow/CMakeFiles/arrow_objlib.dir/ipc/reader.cc.o
> 729912 src/arrow/CMakeFiles/arrow_objlib.dir/tensor/coo_converter.cc.o
> 752776 src/arrow/CMakeFiles/arrow_objlib.dir/tensor/csc_converter.cc.o
> 752776 src/arrow/CMakeFiles/arrow_objlib.dir/tensor/csr_converter.cc.o
> 877680 src/arrow/CMakeFiles/arrow_objlib.dir/array/dict_internal.cc.o
> 885624 src/arrow/CMakeFiles/arrow_objlib.dir/builder.cc.o
> 919072 src/arrow/CMakeFiles/arrow_objlib.dir/scalar.cc.o
> 941776 src/arrow/CMakeFiles/arrow_objlib.dir/ipc/json_internal.cc.o
> 1055248 src/arrow/CMakeFiles/arrow_objlib.dir/ipc/json_simple.cc.o
> 1233304
> src/arrow/CMakeFiles/arrow_objlib.dir/compute/kernels/scalar_compare.cc.o
> 1265160 src/arrow/CMakeFiles/arrow_objlib.dir/sparse_tensor.cc.o
> 1343480 src/arrow/CMakeFiles/arrow_objlib.dir/tensor/csf_converter.cc.o
> 1346928 src/arrow/CMakeFiles/arrow_objlib.dir/array.cc.o
> 1502568
> src/arrow/CMakeFiles/arrow_objlib.dir/compute/kernels/vector_hash.cc.o
> 1609760
> src/arrow/CMakeFiles/arrow_objlib.dir/compute/kernels/scalar_cast_numeric.cc.o
> 1794416 src/arrow/CMakeFiles/arrow_objlib.dir/array/diff.cc.o
> 2759552
> src/arrow/CMakeFiles/arrow_objlib.dir/compute/kernels/vector_filter.cc.o
> 7609432
> src/arrow/CMakeFiles/arrow_objlib.dir/compute/kernels/vector_take.cc.o
> {code}
--
This message was sent by Atlassian Jira
(v8.3.4#803005)