[ 
https://issues.apache.org/jira/browse/ARROW-8970?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wes McKinney updated ARROW-8970:
--------------------------------
    Description: 
We're reaching a point where we may need to be careful about decisions that 
increase code size:

* Instantiating too many templates for code that isn't performance sensitive, 
or where some templates may do the same thing (e.g. Int32Type kernels may do 
the same thing as a Date32Type kernel)
* Inlining functions that don't need to be inline

Code size tends to correlate also with compilation times, but not always.

I'll use this umbrella issue to organize issues related to reducing compiled 
code size

At this moment (2020-05-27), here are the 25 largest object files in a -O2 build

{code}
524896  src/arrow/CMakeFiles/arrow_objlib.dir/array/builder_dict.cc.o
531920  src/arrow/CMakeFiles/arrow_objlib.dir/filesystem/s3fs.cc.o
552000  src/arrow/CMakeFiles/arrow_objlib.dir/json/converter.cc.o
575920  src/arrow/CMakeFiles/arrow_objlib.dir/csv/converter.cc.o
595112  
src/arrow/CMakeFiles/arrow_objlib.dir/compute/kernels/scalar_cast_string.cc.o
645728  src/arrow/CMakeFiles/arrow_objlib.dir/type.cc.o
683040  
src/arrow/CMakeFiles/arrow_objlib.dir/compute/kernels/scalar_set_lookup.cc.o
702232  src/arrow/CMakeFiles/arrow_objlib.dir/ipc/reader.cc.o
729912  src/arrow/CMakeFiles/arrow_objlib.dir/tensor/coo_converter.cc.o
752776  src/arrow/CMakeFiles/arrow_objlib.dir/tensor/csc_converter.cc.o
752776  src/arrow/CMakeFiles/arrow_objlib.dir/tensor/csr_converter.cc.o
877680  src/arrow/CMakeFiles/arrow_objlib.dir/array/dict_internal.cc.o
885624  src/arrow/CMakeFiles/arrow_objlib.dir/builder.cc.o
919072  src/arrow/CMakeFiles/arrow_objlib.dir/scalar.cc.o
941776  src/arrow/CMakeFiles/arrow_objlib.dir/ipc/json_internal.cc.o
1055248 src/arrow/CMakeFiles/arrow_objlib.dir/ipc/json_simple.cc.o
1233304 
src/arrow/CMakeFiles/arrow_objlib.dir/compute/kernels/scalar_compare.cc.o
1265160 src/arrow/CMakeFiles/arrow_objlib.dir/sparse_tensor.cc.o
1343480 src/arrow/CMakeFiles/arrow_objlib.dir/tensor/csf_converter.cc.o
1346928 src/arrow/CMakeFiles/arrow_objlib.dir/array.cc.o
1502568 src/arrow/CMakeFiles/arrow_objlib.dir/compute/kernels/vector_hash.cc.o
1609760 
src/arrow/CMakeFiles/arrow_objlib.dir/compute/kernels/scalar_cast_numeric.cc.o
1794416 src/arrow/CMakeFiles/arrow_objlib.dir/array/diff.cc.o
2759552 src/arrow/CMakeFiles/arrow_objlib.dir/compute/kernels/vector_filter.cc.o
7609432 src/arrow/CMakeFiles/arrow_objlib.dir/compute/kernels/vector_take.cc.o
{code}

  was:
We're reaching a point where we may need to be careful about decisions that 
increase code size:

* Instantiating too many templates for code that isn't performance sensitive
* Inlining functions that don't need to be inline

Code size tends to correlate also with compilation times, but not always.

I'll use this umbrella issue to organize issues related to reducing compiled 
code size

At this moment (2020-05-27), here are the 25 largest object files in a -O2 build

{code}
524896  src/arrow/CMakeFiles/arrow_objlib.dir/array/builder_dict.cc.o
531920  src/arrow/CMakeFiles/arrow_objlib.dir/filesystem/s3fs.cc.o
552000  src/arrow/CMakeFiles/arrow_objlib.dir/json/converter.cc.o
575920  src/arrow/CMakeFiles/arrow_objlib.dir/csv/converter.cc.o
595112  
src/arrow/CMakeFiles/arrow_objlib.dir/compute/kernels/scalar_cast_string.cc.o
645728  src/arrow/CMakeFiles/arrow_objlib.dir/type.cc.o
683040  
src/arrow/CMakeFiles/arrow_objlib.dir/compute/kernels/scalar_set_lookup.cc.o
702232  src/arrow/CMakeFiles/arrow_objlib.dir/ipc/reader.cc.o
729912  src/arrow/CMakeFiles/arrow_objlib.dir/tensor/coo_converter.cc.o
752776  src/arrow/CMakeFiles/arrow_objlib.dir/tensor/csc_converter.cc.o
752776  src/arrow/CMakeFiles/arrow_objlib.dir/tensor/csr_converter.cc.o
877680  src/arrow/CMakeFiles/arrow_objlib.dir/array/dict_internal.cc.o
885624  src/arrow/CMakeFiles/arrow_objlib.dir/builder.cc.o
919072  src/arrow/CMakeFiles/arrow_objlib.dir/scalar.cc.o
941776  src/arrow/CMakeFiles/arrow_objlib.dir/ipc/json_internal.cc.o
1055248 src/arrow/CMakeFiles/arrow_objlib.dir/ipc/json_simple.cc.o
1233304 
src/arrow/CMakeFiles/arrow_objlib.dir/compute/kernels/scalar_compare.cc.o
1265160 src/arrow/CMakeFiles/arrow_objlib.dir/sparse_tensor.cc.o
1343480 src/arrow/CMakeFiles/arrow_objlib.dir/tensor/csf_converter.cc.o
1346928 src/arrow/CMakeFiles/arrow_objlib.dir/array.cc.o
1502568 src/arrow/CMakeFiles/arrow_objlib.dir/compute/kernels/vector_hash.cc.o
1609760 
src/arrow/CMakeFiles/arrow_objlib.dir/compute/kernels/scalar_cast_numeric.cc.o
1794416 src/arrow/CMakeFiles/arrow_objlib.dir/array/diff.cc.o
2759552 src/arrow/CMakeFiles/arrow_objlib.dir/compute/kernels/vector_filter.cc.o
7609432 src/arrow/CMakeFiles/arrow_objlib.dir/compute/kernels/vector_take.cc.o
{code}


> [C++] Reduce shared library code size (umbrella issue)
> ------------------------------------------------------
>
>                 Key: ARROW-8970
>                 URL: https://issues.apache.org/jira/browse/ARROW-8970
>             Project: Apache Arrow
>          Issue Type: Improvement
>          Components: C++
>            Reporter: Wes McKinney
>            Priority: Major
>
> We're reaching a point where we may need to be careful about decisions that 
> increase code size:
> * Instantiating too many templates for code that isn't performance sensitive, 
> or where some templates may do the same thing (e.g. Int32Type kernels may do 
> the same thing as a Date32Type kernel)
> * Inlining functions that don't need to be inline
> Code size tends to correlate also with compilation times, but not always.
> I'll use this umbrella issue to organize issues related to reducing compiled 
> code size
> At this moment (2020-05-27), here are the 25 largest object files in a -O2 
> build
> {code}
> 524896        src/arrow/CMakeFiles/arrow_objlib.dir/array/builder_dict.cc.o
> 531920        src/arrow/CMakeFiles/arrow_objlib.dir/filesystem/s3fs.cc.o
> 552000        src/arrow/CMakeFiles/arrow_objlib.dir/json/converter.cc.o
> 575920        src/arrow/CMakeFiles/arrow_objlib.dir/csv/converter.cc.o
> 595112        
> src/arrow/CMakeFiles/arrow_objlib.dir/compute/kernels/scalar_cast_string.cc.o
> 645728        src/arrow/CMakeFiles/arrow_objlib.dir/type.cc.o
> 683040        
> src/arrow/CMakeFiles/arrow_objlib.dir/compute/kernels/scalar_set_lookup.cc.o
> 702232        src/arrow/CMakeFiles/arrow_objlib.dir/ipc/reader.cc.o
> 729912        src/arrow/CMakeFiles/arrow_objlib.dir/tensor/coo_converter.cc.o
> 752776        src/arrow/CMakeFiles/arrow_objlib.dir/tensor/csc_converter.cc.o
> 752776        src/arrow/CMakeFiles/arrow_objlib.dir/tensor/csr_converter.cc.o
> 877680        src/arrow/CMakeFiles/arrow_objlib.dir/array/dict_internal.cc.o
> 885624        src/arrow/CMakeFiles/arrow_objlib.dir/builder.cc.o
> 919072        src/arrow/CMakeFiles/arrow_objlib.dir/scalar.cc.o
> 941776        src/arrow/CMakeFiles/arrow_objlib.dir/ipc/json_internal.cc.o
> 1055248       src/arrow/CMakeFiles/arrow_objlib.dir/ipc/json_simple.cc.o
> 1233304       
> src/arrow/CMakeFiles/arrow_objlib.dir/compute/kernels/scalar_compare.cc.o
> 1265160       src/arrow/CMakeFiles/arrow_objlib.dir/sparse_tensor.cc.o
> 1343480       src/arrow/CMakeFiles/arrow_objlib.dir/tensor/csf_converter.cc.o
> 1346928       src/arrow/CMakeFiles/arrow_objlib.dir/array.cc.o
> 1502568       
> src/arrow/CMakeFiles/arrow_objlib.dir/compute/kernels/vector_hash.cc.o
> 1609760       
> src/arrow/CMakeFiles/arrow_objlib.dir/compute/kernels/scalar_cast_numeric.cc.o
> 1794416       src/arrow/CMakeFiles/arrow_objlib.dir/array/diff.cc.o
> 2759552       
> src/arrow/CMakeFiles/arrow_objlib.dir/compute/kernels/vector_filter.cc.o
> 7609432       
> src/arrow/CMakeFiles/arrow_objlib.dir/compute/kernels/vector_take.cc.o
> {code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

Reply via email to