[ 
https://issues.apache.org/jira/browse/ARROW-5924?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

shengjun.li updated ARROW-5924:
-------------------------------
    Description: 
cmake_modules/DefineOptions.cmake
   define_option(ARROW_CUDA "Build the Arrow CUDA extensions (requires CUDA 
toolkit)" ON)
   define_option(ARROW_PLASMA "Build the plasma object store along with Arrow" 
ON)

The corrent sequence is as follow:
 (1) plasma_client.Create(object_id, size, nullptr, 0, &buff, 1);  // where 
device_num > 0
 (2) plasma_client.Seal(object_id);
 (3) buff = nullptr;
 (4) plasma_client.Release(object_id);
 (5) plasma_client.Delete(object_id);

To set buff nullptr (step 3) just before release the object (step 4) because 
CloseIpcBuffer is in its destructor (class CudaBuffer).
If a user does not do that promptly, CloseIpcBuffer will be blocked.
Then, the following error may occure when another object created.
     IOError: Cuda Driver API call in 
/home/zilliz/arrow/cpp/src/arrow/gpu/cuda_context.cc at line 156 failed with 
code 208: cuIpcOpenMemHandle(&data, *handle, 
CU_IPC_MEM_LAZY_ENABLE_PEER_ACCESS) (nil)

Here is a sample.
thread 1:
{
std::shared_ptr buff;
plasma_client1.Create(object_id1, size, nullptr, 0, &buff, 1);
plasma_client1.Seal(object_id);
// not to set buff nullptr
plasma_client1.Release(object_id);
plasma_client1.Delete(object_id);
// ... do someting else or not to do anything
}
// let buff auto release here.

thread 2:
{
std::shared_ptr buff;
plasma_client2.Create(object_id2, size, nullptr, 0, &buff, 1);
// If the address allocated by the server is just the object_id1 released, 
error occur!
}

  was:
cmake_modules/DefineOptions.cmake
   define_option(ARROW_CUDA "Build the Arrow CUDA extensions (requires CUDA 
toolkit)" ON)
   define_option(ARROW_PLASMA "Build the plasma object store along with Arrow" 
ON)

The corrent sequence is as follow:
 (1) plasma_client.Create(object_id, size, nullptr, 0, &buff, 1);  // where 
device_num > 0
 (2) plasma_client.Seal(object_id);
 (3) buff = nullptr;
 (4) plasma_client.Release(object_id);
 (5) plasma_client.Delete(object_id);

To set buff nullptr (step 3) just before release the object (step 4) because 
CloseIpcBuffer is in its destructor (class CudaBuffer).
If a user does not do that promptly, CloseIpcBuffer will be blocked.
Then, the following error may occure when another object created.
     IOError: Cuda Driver API call in 
/home/zilliz/arrow/cpp/src/arrow/gpu/cuda_context.cc at line 156 failed with 
code 208: cuIpcOpenMemHandle(&data, *handle, 
CU_IPC_MEM_LAZY_ENABLE_PEER_ACCESS) (nil)

Here is a sample.
thread 1:
{
  std::shared_ptr buff;
  plasma_client1.Create(object_id1, size, nullptr, 0, &buff, 1);
  plasma_client1.Seal(object_id);
  // not to set buff nullptr
  plasma_client1.Release(object_id);
  plasma_client1.Delete(object_id);
  // ... do someting else or not to do anything
}
// let buff auto release here.

thread 2:
{
  std::shared_ptr buff;
  plasma_client2.Create(object_id2, size, nullptr, 0, &buff, 1);
  // If the address allocated by the server is just the object_id1 released, 
error occur!
}


> [C++][Plasma] It is not convenient to release a GPU object
> ----------------------------------------------------------
>
>                 Key: ARROW-5924
>                 URL: https://issues.apache.org/jira/browse/ARROW-5924
>             Project: Apache Arrow
>          Issue Type: Improvement
>          Components: C++ - Plasma
>    Affects Versions: 0.14.0
>            Reporter: shengjun.li
>            Priority: Major
>              Labels: pull-request-available
>             Fix For: 0.14.1
>
>          Time Spent: 2h 10m
>  Remaining Estimate: 0h
>
> cmake_modules/DefineOptions.cmake
>    define_option(ARROW_CUDA "Build the Arrow CUDA extensions (requires CUDA 
> toolkit)" ON)
>    define_option(ARROW_PLASMA "Build the plasma object store along with 
> Arrow" ON)
> The corrent sequence is as follow:
>  (1) plasma_client.Create(object_id, size, nullptr, 0, &buff, 1);  // where 
> device_num > 0
>  (2) plasma_client.Seal(object_id);
>  (3) buff = nullptr;
>  (4) plasma_client.Release(object_id);
>  (5) plasma_client.Delete(object_id);
> To set buff nullptr (step 3) just before release the object (step 4) because 
> CloseIpcBuffer is in its destructor (class CudaBuffer).
> If a user does not do that promptly, CloseIpcBuffer will be blocked.
> Then, the following error may occure when another object created.
>      IOError: Cuda Driver API call in 
> /home/zilliz/arrow/cpp/src/arrow/gpu/cuda_context.cc at line 156 failed with 
> code 208: cuIpcOpenMemHandle(&data, *handle, 
> CU_IPC_MEM_LAZY_ENABLE_PEER_ACCESS) (nil)
> Here is a sample.
> thread 1:
> {
> std::shared_ptr buff;
> plasma_client1.Create(object_id1, size, nullptr, 0, &buff, 1);
> plasma_client1.Seal(object_id);
> // not to set buff nullptr
> plasma_client1.Release(object_id);
> plasma_client1.Delete(object_id);
> // ... do someting else or not to do anything
> }
> // let buff auto release here.
> thread 2:
> {
> std::shared_ptr buff;
> plasma_client2.Create(object_id2, size, nullptr, 0, &buff, 1);
> // If the address allocated by the server is just the object_id1 released, 
> error occur!
> }



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)

Reply via email to