clintropolis opened a new pull request, #19561:
URL: https://github.com/apache/druid/pull/19561

   ### Description
   Follow-up to #19512, this PR adds opt-in jdk.incubator.vector 
specializations of `LongSumVectorAggregator`, `DoubleSumVectorAggregator`, and 
`FloatSumVectorAggregator`'s ungrouped aggregate(buf, position, startRow, 
endRow) path. This functionality is controlled by the same 
`druid.expressions.useVectorApi` flag introduced for expression vector 
processors in the prior PR (off by default).
   
   
   Shows a pretty nice improvement:
   ```
   // 0: non-expression timeseries reference, 1 columns
   "SELECT SUM(long1) FROM expressions",
   // 4: non-expression timeseries reference, 5 columns
   "SELECT SUM(long1), SUM(long4), SUM(double1), SUM(float3), SUM(long5) FROM 
expressions",
   // 7: math op - 2 longs
   "SELECT SUM(long1 * long2) FROM expressions",
   // 11: all same math op - 3 longs, 1 double, 1 float
   "SELECT SUM(long5 * float3 * long1 * long4 * double1) FROM expressions",
   ```
   
   ```
   Benchmark                        (complexCompression)  
(deferExpressionDimensions)  (jsonObjectStorageEncoding)  (query)  
(rowsPerSegment)  (schemaType)  (storageType)  (stringEncoding)  (useVectorApi) 
 (vectorize)  Mode  Cnt   Score   Error  Units
   SqlExpressionBenchmark.querySql                  NONE                 
singleString                        SMILE        0           1500000      
explicit           MMAP              UTF8           false        force  avgt    
5   5.793 ± 0.554  ms/op
   SqlExpressionBenchmark.querySql                  NONE                 
singleString                        SMILE        0           1500000      
explicit           MMAP              UTF8            true        force  avgt    
5   5.270 ± 0.264  ms/op
   SqlExpressionBenchmark.querySql                  NONE                 
singleString                        SMILE        4           1500000      
explicit           MMAP              UTF8           false        force  avgt    
5  32.833 ± 0.879  ms/op
   SqlExpressionBenchmark.querySql                  NONE                 
singleString                        SMILE        4           1500000      
explicit           MMAP              UTF8            true        force  avgt    
5  25.135 ± 0.905  ms/op
   SqlExpressionBenchmark.querySql                  NONE                 
singleString                        SMILE        7           1500000      
explicit           MMAP              UTF8           false        force  avgt    
5  14.073 ± 0.602  ms/op
   SqlExpressionBenchmark.querySql                  NONE                 
singleString                        SMILE        7           1500000      
explicit           MMAP              UTF8            true        force  avgt    
5   9.602 ± 0.362  ms/op
   SqlExpressionBenchmark.querySql                  NONE                 
singleString                        SMILE       11           1500000      
explicit           MMAP              UTF8           false        force  avgt    
5  61.997 ± 1.508  ms/op
   SqlExpressionBenchmark.querySql                  NONE                 
singleString                        SMILE       11           1500000      
explicit           MMAP              UTF8            true        force  avgt    
5  25.191 ± 0.953  ms/op
   ```
   
   I changed query 0 to be sum on long5 which has nulls which shows a nicer 
improvement than the original 0 on long1 which has no nulls.
   
   ```
   Benchmark                        (complexCompression)  
(deferExpressionDimensions)  (jsonObjectStorageEncoding)  (query)  
(rowsPerSegment)  (schemaType)  (storageType)  (stringEncoding)  (useVectorApi) 
 (vectorize)  Mode  Cnt   Score   Error  Units
   SqlExpressionBenchmark.querySql                  NONE                 
singleString                        SMILE        0           1500000      
explicit           MMAP              UTF8           false        force  avgt    
5  16.936 ± 1.073  ms/op
   SqlExpressionBenchmark.querySql                  NONE                 
singleString                        SMILE        0           1500000      
explicit           MMAP              UTF8            true        force  avgt    
5  10.676 ± 0.752  ms/op
   ```
   
   changes:
   * add `NullAwareVectorAggregator` marker interface declaring `aggregate(buf, 
position, startRow, endRow, nullVector)` for delegates that handle nulls 
themselves; return value reports whether any non-null row contributed.
   * update `NullableNumericVectorAggregator.aggregate(buf, position, startRow, 
endRow)` to a three-way dispatch: null-free fast path; `instanceof 
NullAwareVectorAggregator` -> new null-aware overload (set null marker iff it 
returned true); else existing scatter-gather fallback.
   * add `SimdLongSumVectorAggregator`, `SimdDoubleSumVectorAggregator`, and 
`SimdFloatSumVectorAggregator` under `query/aggregation/simd/`. Each extends 
its scalar parent, overrides the ungrouped no-null aggregate with a 
`va.add(vb)` + `reduceLanes(VectorOperators.ADD)` SIMD reduction, and 
implements the null-aware overload using `VectorMask`-based masked accumulation 
with `notNull.trueCount()` for the non-null check.
   * wire `Long/Double/FloatSumAggregatorFactory.factorizeVector` to dispatch 
on `ExpressionProcessing.useVectorApi()`.
   * add `SimdSumVectorAggregatorTest`, for each of the three types, tries 
various vector sizes and null patterns and asserts SIMD output matches the 
scalar reference (exact for long, within relative tolerance for double/float to 
accommodate SIMD's tree-reduce vs scalar's left-to-right reduce).


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to