[GitHub] [tvm] janetsc commented on a diff in pull request #12947: [Hexagon] [runtime] VTCM Allocator

GitBox Wed, 05 Oct 2022 13:18:43 -0700


janetsc commented on code in PR #12947:
URL: https://github.com/apache/tvm/pull/12947#discussion_r985768688



##########
src/runtime/hexagon/hexagon_vtcm_pool.h:
##########
@@ -0,0 +1,100 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *   http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing,
+ * software distributed under the License is distributed on an
+ * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+ * KIND, either express or implied.  See the License for the
+ * specific language governing permissions and limitations
+ * under the License.
+ */
+
+#ifndef TVM_RUNTIME_HEXAGON_HEXAGON_VTCM_POOL_H_
+#define TVM_RUNTIME_HEXAGON_HEXAGON_VTCM_POOL_H_
+
+#include <tvm/runtime/c_runtime_api.h>
+#include <tvm/runtime/device_api.h>
+#include <tvm/runtime/logging.h>
+#include <tvm/runtime/ndarray.h>
+#include <tvm/runtime/packed_func.h>
+
+#include <list>
+#include <utility>
+
+namespace tvm {
+namespace runtime {
+namespace hexagon {
+
+class HexagonVtcmPool {
+ public:
+  //! \brief Allocates all of VTCM memory, and manages allocations from the 
runtime
+  HexagonVtcmPool();
+
+  //! \brief Destruction deallocates the underlying VTCM allocation.
+  ~HexagonVtcmPool();
+
+  //! \brief Prevent copy construction of HexagonVtcmPool.
+  HexagonVtcmPool(const HexagonVtcmPool&) = delete;
+
+  //! \brief Prevent copy assignment with HexagonVtcmPool.
+  HexagonVtcmPool& operator=(const HexagonVtcmPool&) = delete;
+
+  //! \brief Prevent move construction.
+  HexagonVtcmPool(HexagonVtcmPool&&) = delete;
+
+  //! \brief Prevent move assignment.
+  HexagonVtcmPool& operator=(HexagonVtcmPool&&) = delete;
+
+  /* \brief Allocate memory from the VTCM manager
+   *
+   * \param nbytes The number of bytes to allocate.
+   */
+  void* Allocate(size_t nbytes);
+
+  /* \brief Copy data from a Hexagon Buffer an external buffer.
+   *
+   * \param ptr The pointer to the buffer to be freed.
+   *
+   * \param nbytes The number of bytes to be freed.
+   */
+  void Free(void* ptr, size_t nbytes);
+
+  //! \brief Returns the total number of bytes in this pool
+  size_t TotalBytes() { return reinterpret_cast<size_t>(vtcm_size_); }
+
+ private:
+  //! \brief Context for HAP_compute_res_*
+  unsigned int vtcm_size_;
+
+  //! \brief Context for HAP_compute_res_*
+  void* vtcm_data_;
+
+  //! \brief Context for HAP_compute_res_*
+  unsigned int context_id_{0};
+
+  //! \brief List of allocations
+  std::list<std::pair<char*, size_t>> allocations_;

Review Comment:
   Thanks for the thorough review, @kparzysz-quic .  I considered vectors.  
But, as you say, we will have multiple allocations of tiny blocks in a dynamic 
kernel that doesn't have static memory planning.
   
   The allocations_ list holds lists of each individual allocations so that I 
can verify the size on free.  This can be done in the order of allocations.  
(It doesn't need to be kept in order by allocated ptr.)
   
   The free_ list needs to be able to merge entries when I free so that we keep 
track of the largest free blocks available.  So this one needs to be ordered by 
pointer.
   
   I'd like to keep as lists for now, and investigate optimizing this in a 
subsequent step.  (And do a comparison side by side with a vector 
implementation with many different models.). What I can do now is change the 
order in which I add to the allocations_ list so that new allocations go to the 
front, as we will likely free those small allocations before allocating again.



##########
src/runtime/hexagon/hexagon_vtcm_pool.h:
##########
@@ -0,0 +1,100 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *   http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing,
+ * software distributed under the License is distributed on an
+ * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+ * KIND, either express or implied.  See the License for the
+ * specific language governing permissions and limitations
+ * under the License.
+ */
+
+#ifndef TVM_RUNTIME_HEXAGON_HEXAGON_VTCM_POOL_H_
+#define TVM_RUNTIME_HEXAGON_HEXAGON_VTCM_POOL_H_
+
+#include <tvm/runtime/c_runtime_api.h>
+#include <tvm/runtime/device_api.h>
+#include <tvm/runtime/logging.h>
+#include <tvm/runtime/ndarray.h>
+#include <tvm/runtime/packed_func.h>
+
+#include <list>
+#include <utility>
+
+namespace tvm {
+namespace runtime {
+namespace hexagon {
+
+class HexagonVtcmPool {
+ public:
+  //! \brief Allocates all of VTCM memory, and manages allocations from the 
runtime
+  HexagonVtcmPool();
+
+  //! \brief Destruction deallocates the underlying VTCM allocation.
+  ~HexagonVtcmPool();
+
+  //! \brief Prevent copy construction of HexagonVtcmPool.
+  HexagonVtcmPool(const HexagonVtcmPool&) = delete;
+
+  //! \brief Prevent copy assignment with HexagonVtcmPool.
+  HexagonVtcmPool& operator=(const HexagonVtcmPool&) = delete;
+
+  //! \brief Prevent move construction.
+  HexagonVtcmPool(HexagonVtcmPool&&) = delete;
+
+  //! \brief Prevent move assignment.
+  HexagonVtcmPool& operator=(HexagonVtcmPool&&) = delete;
+
+  /* \brief Allocate memory from the VTCM manager
+   *
+   * \param nbytes The number of bytes to allocate.
+   */
+  void* Allocate(size_t nbytes);
+
+  /* \brief Copy data from a Hexagon Buffer an external buffer.
+   *
+   * \param ptr The pointer to the buffer to be freed.
+   *
+   * \param nbytes The number of bytes to be freed.
+   */
+  void Free(void* ptr, size_t nbytes);
+
+  //! \brief Returns the total number of bytes in this pool
+  size_t TotalBytes() { return reinterpret_cast<size_t>(vtcm_size_); }
+
+ private:
+  //! \brief Context for HAP_compute_res_*
+  unsigned int vtcm_size_;
+
+  //! \brief Context for HAP_compute_res_*
+  void* vtcm_data_;
+
+  //! \brief Context for HAP_compute_res_*
+  unsigned int context_id_{0};
+
+  //! \brief List of allocations
+  std::list<std::pair<char*, size_t>> allocations_;

Review Comment:
   Thanks for the thorough review, @kparzysz-quic .  I considered vectors.  
But, as you say, we will have multiple allocations of tiny blocks in a dynamic 
kernel that doesn't have static memory planning.
   
   The allocations_ list holds lists of each individual allocations so that I 
can verify the size on free.  This can be done in the order of allocations.  
(It doesn't need to be kept in order by allocated ptr.)
   
   The free_ list needs to be able to merge entries when I free so that we keep 
track of the largest free blocks available.  So this one needs to be ordered by 
pointer.
   
   I'd like to keep as lists for now, and investigate optimizing this in a 
subsequent step.  (And do a comparison side by side with a vector 
implementation with many different models.). What I can do now is change the 
order in which I add to the allocations_ list so that new allocations go to the 
front, as we will likely free those small allocations before allocating again.  
I'm still considering that.



##########
src/runtime/hexagon/hexagon_vtcm_pool.cc:
##########
@@ -0,0 +1,166 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *   http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing,
+ * software distributed under the License is distributed on an
+ * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+ * KIND, either express or implied.  See the License for the
+ * specific language governing permissions and limitations
+ * under the License.
+ */
+#include "hexagon_vtcm_pool.h"
+
+#include "HAP_compute_res.h"
+#include "hexagon_common.h"
+
+namespace tvm {
+namespace runtime {
+namespace hexagon {
+
+HexagonVtcmPool::HexagonVtcmPool() {
+  compute_res_attr_t res_info;
+  HEXAGON_SAFE_CALL(HAP_compute_res_attr_init(&res_info));
+
+  // TODO(HWE): get the max  and min size programmatically
+  const unsigned int max_size = 4 * 1024 * 1024;
+  const unsigned int min_size = 1024 * 1024;
+
+  // allocate nbytes of vtcm on a single page
+  HEXAGON_SAFE_CALL(HAP_compute_res_attr_set_vtcm_param_v2(&res_info,
+                                                           /*vtcm_size = */ 
max_size,
+                                                           /*min_page_size = 
*/ 1,
+                                                           /*min_vtcm_size = 
*/ min_size));
+
+  // TODO(HWE): Investigate why a non-zero timeout results in
+  // hanging, both in the simulator and on hardware.
+  context_id_ = HAP_compute_res_acquire(&res_info, /*timeout = */ 0);
+  CHECK(context_id_) << "HAP_compute_res_acquire failed to acquire requested 
VTCM resource.";
+  HEXAGON_SAFE_CALL(HAP_compute_res_attr_get_vtcm_ptr_v2(&res_info, 
&vtcm_data_, &vtcm_size_));
+  CHECK(vtcm_data_ != nullptr) << "HAP_compute_res_acquire returned nullptr 
when allocating VTCM.";
+  CHECK(vtcm_size_ >= min_size)
+      << "HAP_compute_res_acquire failed to allocate minimum amount of VTCM";
+  free_.emplace_back(std::pair<char*, size_t>(static_cast<char*>(vtcm_data_), 
vtcm_size_));
+  // DebugDump();
+}
+
+HexagonVtcmPool::~HexagonVtcmPool() { 
HEXAGON_SAFE_CALL(HAP_compute_res_release(context_id_)); }
+
+void* HexagonVtcmPool::Allocate(size_t nbytes) {
+  std::lock_guard<std::mutex> lock(mutex_);
+
+  CHECK(!free_.empty()) << "No free VTCM";
+
+  // If this is not aligned on a 2k block, allocate from the end to avoid 
fragmentation
+  if (nbytes & size_t(0x7FF)) {
+    DLOG(INFO) << "VTCM nbytes requested: " << nbytes << " allocate from the 
end";
+    auto last_free_entry = free_.rbegin();
+    CHECK(last_free_entry->second >= nbytes)
+        << "Not enough contiguous VTCM space at the end to allocate";
+    char* ptr = last_free_entry->first + (last_free_entry->second - nbytes);
+    allocations_.emplace_back(std::pair<char*, size_t>(ptr, nbytes));

Review Comment:
   I tried this and the compiler gave an error.  I'd like to leave with the 
construction of std::pair for now.



##########
src/runtime/hexagon/hexagon_vtcm_pool.cc:
##########
@@ -0,0 +1,166 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *   http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing,
+ * software distributed under the License is distributed on an
+ * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+ * KIND, either express or implied.  See the License for the
+ * specific language governing permissions and limitations
+ * under the License.
+ */
+#include "hexagon_vtcm_pool.h"
+
+#include "HAP_compute_res.h"
+#include "hexagon_common.h"
+
+namespace tvm {
+namespace runtime {
+namespace hexagon {
+
+HexagonVtcmPool::HexagonVtcmPool() {
+  compute_res_attr_t res_info;
+  HEXAGON_SAFE_CALL(HAP_compute_res_attr_init(&res_info));
+
+  // TODO(HWE): get the max  and min size programmatically
+  const unsigned int max_size = 4 * 1024 * 1024;
+  const unsigned int min_size = 1024 * 1024;
+
+  // allocate nbytes of vtcm on a single page
+  HEXAGON_SAFE_CALL(HAP_compute_res_attr_set_vtcm_param_v2(&res_info,
+                                                           /*vtcm_size = */ 
max_size,
+                                                           /*min_page_size = 
*/ 1,
+                                                           /*min_vtcm_size = 
*/ min_size));
+
+  // TODO(HWE): Investigate why a non-zero timeout results in
+  // hanging, both in the simulator and on hardware.
+  context_id_ = HAP_compute_res_acquire(&res_info, /*timeout = */ 0);
+  CHECK(context_id_) << "HAP_compute_res_acquire failed to acquire requested 
VTCM resource.";
+  HEXAGON_SAFE_CALL(HAP_compute_res_attr_get_vtcm_ptr_v2(&res_info, 
&vtcm_data_, &vtcm_size_));
+  CHECK(vtcm_data_ != nullptr) << "HAP_compute_res_acquire returned nullptr 
when allocating VTCM.";
+  CHECK(vtcm_size_ >= min_size)
+      << "HAP_compute_res_acquire failed to allocate minimum amount of VTCM";
+  free_.emplace_back(std::pair<char*, size_t>(static_cast<char*>(vtcm_data_), 
vtcm_size_));
+  // DebugDump();
+}
+
+HexagonVtcmPool::~HexagonVtcmPool() { 
HEXAGON_SAFE_CALL(HAP_compute_res_release(context_id_)); }
+
+void* HexagonVtcmPool::Allocate(size_t nbytes) {
+  std::lock_guard<std::mutex> lock(mutex_);
+
+  CHECK(!free_.empty()) << "No free VTCM";
+
+  // If this is not aligned on a 2k block, allocate from the end to avoid 
fragmentation
+  if (nbytes & size_t(0x7FF)) {
+    DLOG(INFO) << "VTCM nbytes requested: " << nbytes << " allocate from the 
end";
+    auto last_free_entry = free_.rbegin();
+    CHECK(last_free_entry->second >= nbytes)
+        << "Not enough contiguous VTCM space at the end to allocate";
+    char* ptr = last_free_entry->first + (last_free_entry->second - nbytes);
+    allocations_.emplace_back(std::pair<char*, size_t>(ptr, nbytes));
+    last_free_entry->second -= nbytes;
+    // DebugDump();
+    return ptr;
+  }
+
+  std::pair<char*, size_t>& entry_to_allocate = free_.front();
+  for (auto entry : free_) {
+    if ((entry.second < entry_to_allocate.second) && (entry.second >= nbytes)) 
{
+      entry_to_allocate = entry;
+      if (entry_to_allocate.second == nbytes) {
+        break;
+      }
+    }
+  }
+  CHECK(entry_to_allocate.second >= nbytes) << "Not enough contiguous VTCM 
space to allocate";
+  char* ptr = entry_to_allocate.first;
+  allocations_.emplace(allocations_.end(), std::pair<char*, size_t>(ptr, 
nbytes));
+
+  for (auto it = free_.begin(); it != free_.end(); it++) {
+    if (ptr == it->first) {
+      if (it->second == nbytes) {
+        free_.erase(it);
+      } else {
+        it->first = it->first + nbytes;
+        it->second = it->second - nbytes;
+      }
+      break;
+    }
+  }
+  // DebugDump();
+  return ptr;
+}
+
+void HexagonVtcmPool::Free(void* ptr, size_t nbytes) {
+  char* ptr_to_free = static_cast<char*>(ptr);
+  std::lock_guard<std::mutex> lock(mutex_);
+
+  bool found_allocation_entry = false;
+  for (auto it = allocations_.begin(); it != allocations_.end(); it++) {
+    if (ptr_to_free == it->first) {
+      CHECK(it->second == nbytes) << "Attempted to free a different size than 
was allocated";
+      allocations_.erase(it);
+      found_allocation_entry = true;
+      break;
+    }
+  }
+  CHECK(found_allocation_entry) << "Attempted to free a pointer that had not 
been allocated";
+
+  auto it = free_.begin();
+  for (; it != free_.end(); it++) {
+    CHECK(ptr_to_free != it->first) << "Attempting to free a pointer that was 
already free";
+    if (ptr_to_free < it->first) {
+      CHECK(ptr_to_free + nbytes <= it->first)
+          << "free_ is in an inconsistent state, freed block overlaps with 
next";
+      if (ptr_to_free + nbytes == it->first) {
+        // Make this entry bigger
+        it->first = ptr_to_free;
+        it->second += nbytes;
+      } else {
+        // Insert an entry before this
+        it = free_.emplace(it, std::pair<char*, size_t>(ptr_to_free, nbytes));
+      }
+      break;
+    }
+  }
+
+  if (it == free_.end()) {
+    // Insert an entry at the end
+    it = free_.emplace(it, std::pair<char*, size_t>(ptr_to_free, nbytes));
+  }
+
+  // Check for overlap with the previous entry
+  if (it != free_.begin()) {
+    auto it_prev = it;
+    it_prev--;
+    CHECK(it_prev->first + it_prev->second <= ptr_to_free)
+        << "free_ is in an inconsistent state, freed block overlaps with 
previous";
+    if (it_prev->first + it_prev->second == ptr_to_free) {
+      it_prev->second += it->second;
+      free_.erase(it);
+    }
+  }
+  // DebugDump();
+}
+
+void HexagonVtcmPool::DebugDump() {
+  LOG(INFO) << "VTCM list state";
+  for (auto it = allocations_.begin(); it != allocations_.end(); it++) {

Review Comment:
   I switched from an iterator to for (auto entry : list_) for both loops.



##########
src/runtime/hexagon/hexagon_vtcm_pool.h:
##########
@@ -0,0 +1,100 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *   http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing,
+ * software distributed under the License is distributed on an
+ * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+ * KIND, either express or implied.  See the License for the
+ * specific language governing permissions and limitations
+ * under the License.
+ */
+
+#ifndef TVM_RUNTIME_HEXAGON_HEXAGON_VTCM_POOL_H_
+#define TVM_RUNTIME_HEXAGON_HEXAGON_VTCM_POOL_H_
+
+#include <tvm/runtime/c_runtime_api.h>
+#include <tvm/runtime/device_api.h>
+#include <tvm/runtime/logging.h>
+#include <tvm/runtime/ndarray.h>
+#include <tvm/runtime/packed_func.h>
+
+#include <list>
+#include <utility>
+
+namespace tvm {
+namespace runtime {
+namespace hexagon {
+
+class HexagonVtcmPool {
+ public:
+  //! \brief Allocates all of VTCM memory, and manages allocations from the 
runtime
+  HexagonVtcmPool();
+
+  //! \brief Destruction deallocates the underlying VTCM allocation.
+  ~HexagonVtcmPool();
+
+  //! \brief Prevent copy construction of HexagonVtcmPool.
+  HexagonVtcmPool(const HexagonVtcmPool&) = delete;
+
+  //! \brief Prevent copy assignment with HexagonVtcmPool.
+  HexagonVtcmPool& operator=(const HexagonVtcmPool&) = delete;
+
+  //! \brief Prevent move construction.
+  HexagonVtcmPool(HexagonVtcmPool&&) = delete;
+
+  //! \brief Prevent move assignment.
+  HexagonVtcmPool& operator=(HexagonVtcmPool&&) = delete;
+
+  /* \brief Allocate memory from the VTCM manager
+   *
+   * \param nbytes The number of bytes to allocate.
+   */
+  void* Allocate(size_t nbytes);
+
+  /* \brief Copy data from a Hexagon Buffer an external buffer.
+   *
+   * \param ptr The pointer to the buffer to be freed.
+   *
+   * \param nbytes The number of bytes to be freed.
+   */
+  void Free(void* ptr, size_t nbytes);
+
+  //! \brief Returns the total number of bytes in this pool
+  size_t TotalBytes() { return reinterpret_cast<size_t>(vtcm_size_); }
+
+ private:
+  //! \brief Context for HAP_compute_res_*
+  unsigned int vtcm_size_;
+
+  //! \brief Context for HAP_compute_res_*
+  void* vtcm_data_;
+
+  //! \brief Context for HAP_compute_res_*
+  unsigned int context_id_{0};
+
+  //! \brief List of allocations
+  std::list<std::pair<char*, size_t>> allocations_;

Review Comment:
   Thanks for the thorough review, @kparzysz-quic .  I considered vectors.  
But, as you say, we will have multiple allocations of tiny blocks in a dynamic 
kernel that doesn't have static memory planning.  I thought inserting/deleting 
in the middle of vectors might be more overhead than the overhead to manage 
lists.
   
   The allocations_ list holds lists of each individual allocations so that I 
can verify the size on free.  This can be done in the order of allocations.  
(It doesn't need to be kept in order by allocated ptr.)
   
   The free_ list needs to be able to merge entries when I free so that we keep 
track of the largest free blocks available.  So this one needs to be ordered by 
pointer.
   
   I'd like to keep as lists for now, and investigate optimizing this in a 
subsequent step.  (And do a comparison side by side with a vector 
implementation with many different models.). What I can do now is change the 
order in which I add to the allocations_ list so that new allocations go to the 
front, as we will likely free those small allocations before allocating again.  
I'm still considering that.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

[GitHub] [tvm] janetsc commented on a diff in pull request #12947: [Hexagon] [runtime] VTCM Allocator

Reply via email to