[jira] [Work logged] (MAHOUT-878) Provide better examples for the parallel ALS recommender code

ASF GitHub Bot (Jira) Wed, 21 Jan 2026 06:34:35 -0800


     [ 
https://issues.apache.org/jira/browse/MAHOUT-878?focusedWorklogId=1001186&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-1001186
 ]


ASF GitHub Bot logged work on MAHOUT-878:
-----------------------------------------

                Author: ASF GitHub Bot
            Created on: 21/Jan/26 14:32
            Start Date: 21/Jan/26 14:32
    Worklog Time Spent: 10m 
      Work Description: ryankert01 commented on code in PR #881:
URL: https://github.com/apache/mahout/pull/881#discussion_r2712826273


##########
qdp/qdp-python/src/lib.rs:
##########
@@ -171,6 +171,145 @@ fn validate_tensor(tensor: &Bound<'_, PyAny>) -> 
PyResult<()> {
     Ok(())
 }
 
+/// Check if a PyTorch tensor is on a CUDA device
+fn is_cuda_tensor(tensor: &Bound<'_, PyAny>) -> PyResult<bool> {
+    let device = tensor.getattr("device")?;
+    let device_type: String = device.getattr("type")?.extract()?;
+    Ok(device_type == "cuda")
+}
+
+/// Get the CUDA device index from a PyTorch tensor
+fn get_tensor_device_id(tensor: &Bound<'_, PyAny>) -> PyResult<i32> {
+    let device = tensor.getattr("device")?;
+    let device_index: i32 = device.getattr("index")?.extract()?;
+    Ok(device_index)
+}
+
+/// Validate a CUDA tensor for direct GPU encoding
+/// Checks: dtype=float64, contiguous, non-empty, device_id matches engine
+fn validate_cuda_tensor_for_encoding(
+    tensor: &Bound<'_, PyAny>,
+    expected_device_id: usize,
+    encoding_method: &str,
+) -> PyResult<()> {
+    // Check encoding method support (currently only amplitude is supported 
for CUDA tensors)
+    if encoding_method != "amplitude" {
+        return Err(PyRuntimeError::new_err(format!(
+            "CUDA tensor encoding currently only supports 'amplitude' method, 
got '{}'. \
+             Use tensor.cpu() to convert to CPU tensor for other encoding 
methods.",
+            encoding_method
+        )));
+    }
+
+    // Check dtype is float64
+    let dtype = tensor.getattr("dtype")?;
+    let dtype_str: String = dtype.str()?.extract()?;
+    if !dtype_str.contains("float64") {
+        return Err(PyRuntimeError::new_err(format!(
+            "CUDA tensor must have dtype float64, got {}. Use 
tensor.to(torch.float64)",
+            dtype_str
+        )));
+    }
+
+    // Check contiguous
+    let is_contiguous: bool = tensor.call_method0("is_contiguous")?.extract()?;
+    if !is_contiguous {
+        return Err(PyRuntimeError::new_err(
+            "CUDA tensor must be contiguous. Use tensor.contiguous()",
+        ));
+    }
+
+    // Check non-empty
+    let numel: usize = tensor.call_method0("numel")?.extract()?;
+    if numel == 0 {
+        return Err(PyRuntimeError::new_err("CUDA tensor cannot be empty"));
+    }
+
+    // Check device matches engine
+    let tensor_device_id = get_tensor_device_id(tensor)?;
+    if tensor_device_id as usize != expected_device_id {
+        return Err(PyRuntimeError::new_err(format!(
+            "Device mismatch: tensor is on cuda:{}, but engine is on cuda:{}. \
+             Move tensor with tensor.to('cuda:{}')",
+            tensor_device_id, expected_device_id, expected_device_id
+        )));
+    }
+
+    Ok(())
+}
+
+/// DLPack tensor information extracted from a PyCapsule
+struct DLPackTensorInfo {
+    data_ptr: *const f64,
+    shape: Vec<i64>,
+    /// CUDA device ID from DLPack metadata.
+    /// Currently unused but kept for potential future device validation or 
multi-GPU support.
+    #[allow(dead_code)]
+    device_id: i32,
+}
+
+/// Extract GPU pointer from PyTorch tensor's __dlpack__() capsule
+///
+/// # Safety
+/// The returned `data_ptr` points to GPU memory owned by the source tensor.
+/// The caller must ensure the source tensor remains alive and unmodified
+/// for the entire duration that `data_ptr` is in use. Python's GIL ensures
+/// the tensor won't be garbage collected during `encode()`, but the caller
+/// must not deallocate or resize the tensor while encoding is in progress.
+fn extract_dlpack_tensor(_py: Python<'_>, tensor: &Bound<'_, PyAny>) -> 
PyResult<DLPackTensorInfo> {

Review Comment:
   yes, but I'm not sure if I've solved it all, I only solve related ones.





Issue Time Tracking
-------------------

    Worklog Id:     (was: 1001186)
    Time Spent: 4h 20m  (was: 4h 10m)

> Provide better examples for the parallel ALS recommender code
> -------------------------------------------------------------
>
>                 Key: MAHOUT-878
>                 URL: https://issues.apache.org/jira/browse/MAHOUT-878
>             Project: Mahout
>          Issue Type: Task
>    Affects Versions: 1.0.0
>            Reporter: Sebastian Schelter
>            Assignee: Sebastian Schelter
>            Priority: Major
>             Fix For: 0.6
>
>         Attachments: MAHOUT-878.patch
>
>          Time Spent: 4h 20m
>  Remaining Estimate: 0h
>
> We should provide examples that show how to apply the parallel ALS 
> recommender to the Netflix or KDD2011 datasets.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

[jira] [Work logged] (MAHOUT-878) Provide better examples for the parallel ALS recommender code

Reply via email to