b-enedict opened a new pull request, #2459:
URL: https://github.com/apache/systemds/pull/2459

   # Summary
   
   This PR introduces new functionality for multimodal learning in Scuro, 
including a contrastive learning operator, a modality alignment operator, and 
additional data loaders.
   
   ### Changes
   
   **Contrastive Learning Operator**
   - Constructs modality pairs via a Cartesian product
   - Uses a user-defined function to label pairs as positive or negative
   - Enables dynamic generation of contrastive samples
   
   **Modality Alignment Operator**
   - Aligns previously unaligned modalities using feature-based similarity 
(e.g., ORB, perceptual hashing)
   - Outputs a matching between a primary and secondary modality
   - Matching is applied after representation learning and before fusion
   
   **Data Loaders**
   - PDF loader: converts document pages into NumPy arrays for OpenCV processing
   - Audio loader: converts audio to text using faster-whisper


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to