PeilinChen01 opened a new pull request, #2498:
URL: https://github.com/apache/systemds/pull/2498

   ### Summary
   
   This pull request contains the midterm progress for the Isolation Forest 
builtin implementation in SystemDS.
   
   Main changes:
   
   * Refactored the Isolation Forest training and scoring logic.
   * Clamped `subsampling_size` to the actual number of input rows when 
necessary.
   * Added support for single-row scoring in `outlierByIsolationForestApply`.
   * Added support for single-tree models.
   * Added handling for constant data where no valid split is possible.
   * Made seeded training reproducible.
   * Added edge-case tests for the Isolation Forest builtin.
   
   ### Tests
   
   The following command passes locally:
   
   ```bash
   mvn -Dtest=BuiltinIsolationForestTest test
   ```
   
   Covered test cases include:
   
   * Basic model training and scoring
   * Hybrid execution mode
   * Subsampling-size clamping
   * Single-row apply
   * Single-tree model
   * Constant data
   * Seed reproducibility
   * Anomaly ranking
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to