[ 
https://issues.apache.org/jira/browse/MAHOUT-878?focusedWorklogId=1000941&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-1000941
 ]

ASF GitHub Bot logged work on MAHOUT-878:
-----------------------------------------

                Author: ASF GitHub Bot
            Created on: 20/Jan/26 10:03
            Start Date: 20/Jan/26 10:03
    Worklog Time Spent: 10m 
      Work Description: ryankert01 opened a new pull request, #881:
URL: https://github.com/apache/mahout/pull/881

   ### Purpose of PR
   Add zero-copy encoding support for PyTorch CUDA tensors in 
`QdpEngine.encode()`. Previously, CUDA tensors were rejected with "Only CPU 
tensors are currently supported", forcing users to call `.cpu()` first and 
incurring a GPU→CPU→GPU copy overhead.
   
   ### Motivation
   
   When working with PyTorch CUDA tensors, the previous workflow required:
   ```python
   # Before: GPU → CPU → GPU (slow)
   result = engine.encode(cuda_tensor.cpu(), num_qubits, "amplitude")
   ```
   
   With this change:
   ```python
   # After: GPU → GPU (zero-copy)
   result = engine.encode(cuda_tensor, num_qubits, "amplitude")
   ```
   
   This eliminates unnecessary memory transfers and improves performance for 
GPU-based ML pipelines.
   
   ### Changes
   
   | File | Changes |
   |------|---------|
   | `qdp-core/src/lib.rs` | Added `unsafe` methods `encode_from_gpu_ptr()` and 
`encode_batch_from_gpu_ptr()` |
   | `qdp-python/src/lib.rs` | Added CUDA tensor detection, validation, DLPack 
extraction; modified `encode()` to route CUDA tensors |
   | `qdp-core/.../amplitude.rs` | Made `calculate_inv_norm_gpu()` public |
   | `testing/.../test_bindings.py` | Added 14 new CUDA tensor tests |
   
   
   ### Error Messages
   
   | Condition | Message |
   |-----------|---------|
   | Non-float64 dtype | `CUDA tensor must have dtype float64, got {dtype}. Use 
tensor.to(torch.float64)` |
   | Non-contiguous | `CUDA tensor must be contiguous. Use tensor.contiguous()` 
|
   | Empty tensor | `CUDA tensor cannot be empty` |
   | Device mismatch | `Device mismatch: tensor is on cuda:{X}, but engine is 
on cuda:{Y}. Move tensor with tensor.to('cuda:{Y}')` |
   | Zero/NaN/Inf values | `Input data has zero or non-finite norm (contains 
NaN, Inf, or all zeros)` |
   | Unsupported encoding | `CUDA tensor encoding currently only supports 
'amplitude' method, got '{method}'...` |
   
   ### Follow-ups
   1. support float32
   2. supports more than amplitude encoding (eg. `"angle"`, `"basis"`)
   3. (maybe) refactoring to extend `QuantumEncoder` trait
   
   ### Testing
   
   ```bash
   pytest testing/qdp/test_bindings.py -v
   ```
   
   
   ### Related Issues or PRs
   <!

Issue Time Tracking
-------------------

            Worklog Id:     (was: 1000941)
    Remaining Estimate: 0h
            Time Spent: 10m

> Provide better examples for the parallel ALS recommender code
> -------------------------------------------------------------
>
>                 Key: MAHOUT-878
>                 URL: https://issues.apache.org/jira/browse/MAHOUT-878
>             Project: Mahout
>          Issue Type: Task
>    Affects Versions: 1.0.0
>            Reporter: Sebastian Schelter
>            Assignee: Sebastian Schelter
>            Priority: Major
>             Fix For: 0.6
>
>         Attachments: MAHOUT-878.patch
>
>          Time Spent: 10m
>  Remaining Estimate: 0h
>
> We should provide examples that show how to apply the parallel ALS 
> recommender to the Netflix or KDD2011 datasets.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

Reply via email to