Re: [PR] GH-40062: [C++][Python] Conversion of Table to Arrow Tensor [arrow]

via GitHub Mon, 10 Jun 2024 09:01:51 -0700


AlenkaF commented on PR #41870:
URL: https://github.com/apache/arrow/pull/41870#issuecomment-2158757139


   I have researched the benchmark regression a bit and found that:
   
   - running the benchmarks for `RecordBatch::ToTensor` shows up to 40% of 
change in time (regressions)
   - removing Table creation but keeping the code as is, hardcoding the for 
loop over the chunks to one iteration, makes the regression fall to maximum of 
20%
   
   <details><summary>benchmark diff output</summary>
   
   ```
   (pyarrow-dev) alenkafrim@alenka-mac build % archery --quiet benchmark diff 
--benchmark-filter=ToTensorSimple
   
----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
   Non-regressions: (7)
   
----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
                                                   benchmark       baseline     
 contender  change %                                                            
                                                                                
                                                     counters
    BatchToTensorSimple<Int64Type>/size:65536/num_columns:30  7.321 GiB/sec  
7.341 GiB/sec     0.275  {'family_index': 3, 'per_family_instance_index': 1, 
'run_name': 'BatchToTensorSimple<Int64Type>/size:65536/num_columns:30', 
'repetitions': 1, 'repetition_index': 0, 'threads': 1, 'iterations': 84665}
     BatchToTensorSimple<Int64Type>/size:65536/num_columns:3 17.341 GiB/sec 
17.385 GiB/sec     0.256  {'family_index': 3, 'per_family_instance_index': 0, 
'run_name': 'BatchToTensorSimple<Int64Type>/size:65536/num_columns:3', 
'repetitions': 1, 'repetition_index': 0, 'threads': 1, 'iterations': 197830}
   BatchToTensorSimple<Int32Type>/size:65536/num_columns:300  1.153 GiB/sec  
1.136 GiB/sec    -1.413 {'family_index': 2, 'per_family_instance_index': 2, 
'run_name': 'BatchToTensorSimple<Int32Type>/size:65536/num_columns:300', 
'repetitions': 1, 'repetition_index': 0, 'threads': 1, 'iterations': 13151}
   BatchToTensorSimple<Int64Type>/size:65536/num_columns:300  1.221 GiB/sec  
1.198 GiB/sec    -1.838 {'family_index': 3, 'per_family_instance_index': 2, 
'run_name': 'BatchToTensorSimple<Int64Type>/size:65536/num_columns:300', 
'repetitions': 1, 'repetition_index': 0, 'threads': 1, 'iterations': 13997}
   BatchToTensorSimple<Int16Type>/size:65536/num_columns:300  1.027 GiB/sec  
1.005 GiB/sec    -2.092 {'family_index': 1, 'per_family_instance_index': 2, 
'run_name': 'BatchToTensorSimple<Int16Type>/size:65536/num_columns:300', 
'repetitions': 1, 'repetition_index': 0, 'threads': 1, 'iterations': 11502}
    BatchToTensorSimple<Int16Type>/size:65536/num_columns:30  3.824 GiB/sec  
3.728 GiB/sec    -2.521  {'family_index': 1, 'per_family_instance_index': 1, 
'run_name': 'BatchToTensorSimple<Int16Type>/size:65536/num_columns:30', 
'repetitions': 1, 'repetition_index': 0, 'threads': 1, 'iterations': 43449}
   BatchToTensorSimple<Int16Type>/size:4194304/num_columns:3  4.435 GiB/sec  
4.322 GiB/sec    -2.550   {'family_index': 1, 'per_family_instance_index': 3, 
'run_name': 'BatchToTensorSimple<Int16Type>/size:4194304/num_columns:3', 
'repetitions': 1, 'repetition_index': 0, 'threads': 1, 'iterations': 792}
   
   
----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
   Regressions: (17)
   
----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
                                                     benchmark        baseline  
      contender  change %                                                       
                                                                                
                                                           counters
    BatchToTensorSimple<Int64Type>/size:4194304/num_columns:30   5.354 GiB/sec  
  5.078 GiB/sec    -5.159   {'family_index': 3, 'per_family_instance_index': 4, 
'run_name': 'BatchToTensorSimple<Int64Type>/size:4194304/num_columns:30', 
'repetitions': 1, 'repetition_index': 0, 'threads': 1, 'iterations': 959}
       BatchToTensorSimple<Int32Type>/size:65536/num_columns:3   8.656 GiB/sec  
  8.107 GiB/sec    -6.348    {'family_index': 2, 'per_family_instance_index': 
0, 'run_name': 'BatchToTensorSimple<Int32Type>/size:65536/num_columns:3', 
'repetitions': 1, 'repetition_index': 0, 'threads': 1, 'iterations': 96401}
   BatchToTensorSimple<Int64Type>/size:4194304/num_columns:300   7.884 GiB/sec  
  7.371 GiB/sec    -6.506 {'family_index': 3, 'per_family_instance_index': 5, 
'run_name': 'BatchToTensorSimple<Int64Type>/size:4194304/num_columns:300', 
'repetitions': 1, 'repetition_index': 0, 'threads': 1, 'iterations': 1140}
    BatchToTensorSimple<Int16Type>/size:4194304/num_columns:30   2.109 GiB/sec  
  1.969 GiB/sec    -6.655   {'family_index': 1, 'per_family_instance_index': 4, 
'run_name': 'BatchToTensorSimple<Int16Type>/size:4194304/num_columns:30', 
'repetitions': 1, 'repetition_index': 0, 'threads': 1, 'iterations': 378}
   BatchToTensorSimple<Int16Type>/size:4194304/num_columns:300   2.007 GiB/sec  
  1.869 GiB/sec    -6.878  {'family_index': 1, 'per_family_instance_index': 5, 
'run_name': 'BatchToTensorSimple<Int16Type>/size:4194304/num_columns:300', 
'repetitions': 1, 'repetition_index': 0, 'threads': 1, 'iterations': 360}
      BatchToTensorSimple<Int32Type>/size:65536/num_columns:30   5.514 GiB/sec  
  5.116 GiB/sec    -7.218   {'family_index': 2, 'per_family_instance_index': 1, 
'run_name': 'BatchToTensorSimple<Int32Type>/size:65536/num_columns:30', 
'repetitions': 1, 'repetition_index': 0, 'threads': 1, 'iterations': 62798}
   BatchToTensorSimple<Int32Type>/size:4194304/num_columns:300   3.346 GiB/sec  
  3.066 GiB/sec    -8.379  {'family_index': 2, 'per_family_instance_index': 5, 
'run_name': 'BatchToTensorSimple<Int32Type>/size:4194304/num_columns:300', 
'repetitions': 1, 'repetition_index': 0, 'threads': 1, 'iterations': 601}
      BatchToTensorSimple<Int8Type>/size:65536/num_columns:300 669.230 MiB/sec  
598.420 MiB/sec   -10.581    {'family_index': 0, 'per_family_instance_index': 
2, 'run_name': 'BatchToTensorSimple<Int8Type>/size:65536/num_columns:300', 
'repetitions': 1, 'repetition_index': 0, 'threads': 1, 'iterations': 7493}
       BatchToTensorSimple<Int16Type>/size:65536/num_columns:3   5.393 GiB/sec  
  4.745 GiB/sec   -12.015    {'family_index': 1, 'per_family_instance_index': 
0, 'run_name': 'BatchToTensorSimple<Int16Type>/size:65536/num_columns:3', 
'repetitions': 1, 'repetition_index': 0, 'threads': 1, 'iterations': 61699}
    BatchToTensorSimple<Int8Type>/size:4194304/num_columns:300 700.642 MiB/sec  
611.987 MiB/sec   -12.653   {'family_index': 0, 'per_family_instance_index': 5, 
'run_name': 'BatchToTensorSimple<Int8Type>/size:4194304/num_columns:300', 
'repetitions': 1, 'repetition_index': 0, 'threads': 1, 'iterations': 123}
       BatchToTensorSimple<Int8Type>/size:65536/num_columns:30   1.247 GiB/sec  
  1.075 GiB/sec   -13.836    {'family_index': 0, 'per_family_instance_index': 
1, 'run_name': 'BatchToTensorSimple<Int8Type>/size:65536/num_columns:30', 
'repetitions': 1, 'repetition_index': 0, 'threads': 1, 'iterations': 14200}
     BatchToTensorSimple<Int32Type>/size:4194304/num_columns:3   6.465 GiB/sec  
  5.567 GiB/sec   -13.879   {'family_index': 2, 'per_family_instance_index': 3, 
'run_name': 'BatchToTensorSimple<Int32Type>/size:4194304/num_columns:3', 
'repetitions': 1, 'repetition_index': 0, 'threads': 1, 'iterations': 1156}
     BatchToTensorSimple<Int8Type>/size:4194304/num_columns:30 938.704 MiB/sec  
792.766 MiB/sec   -15.547    {'family_index': 0, 'per_family_instance_index': 
4, 'run_name': 'BatchToTensorSimple<Int8Type>/size:4194304/num_columns:30', 
'repetitions': 1, 'repetition_index': 0, 'threads': 1, 'iterations': 164}
    BatchToTensorSimple<Int32Type>/size:4194304/num_columns:30   2.944 GiB/sec  
  2.453 GiB/sec   -16.660   {'family_index': 2, 'per_family_instance_index': 4, 
'run_name': 'BatchToTensorSimple<Int32Type>/size:4194304/num_columns:30', 
'repetitions': 1, 'repetition_index': 0, 'threads': 1, 'iterations': 529}
     BatchToTensorSimple<Int64Type>/size:4194304/num_columns:3   8.618 GiB/sec  
  7.157 GiB/sec   -16.959   {'family_index': 3, 'per_family_instance_index': 3, 
'run_name': 'BatchToTensorSimple<Int64Type>/size:4194304/num_columns:3', 
'repetitions': 1, 'repetition_index': 0, 'threads': 1, 'iterations': 1521}
      BatchToTensorSimple<Int8Type>/size:4194304/num_columns:3   1.197 GiB/sec 
1008.475 MiB/sec   -17.748     {'family_index': 0, 'per_family_instance_index': 
3, 'run_name': 'BatchToTensorSimple<Int8Type>/size:4194304/num_columns:3', 
'repetitions': 1, 'repetition_index': 0, 'threads': 1, 'iterations': 227}
        BatchToTensorSimple<Int8Type>/size:65536/num_columns:3   1.314 GiB/sec  
  1.057 GiB/sec   -19.601     {'family_index': 0, 'per_family_instance_index': 
0, 'run_name': 'BatchToTensorSimple<Int8Type>/size:65536/num_columns:3', 
'repetitions': 1, 'repetition_index': 0, 'threads': 1, 'iterations': 15032}
   ```
   
   </details>
   
   Plan to also try profiling in python (`py-spy` [doesn't work on 
MacOS](https://github.com/benfred/py-spy/issues/188), any other suggestions 
maybe?).


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Re: [PR] GH-40062: [C++][Python] Conversion of Table to Arrow Tensor [arrow]

Reply via email to