tqchen commented on code in PR #15064:
URL: https://github.com/apache/tvm/pull/15064#discussion_r1225409068
##########
src/runtime/relax_vm/lm_support.cc:
##########
@@ -167,12 +168,56 @@ class AttentionKVCache : public ObjectRef {
TVM_REGISTER_OBJECT_TYPE(AttentionKVCacheObj);
+/*!
+ * \brief Create multiple kv caches with same shape, from single memory
allocation.
+ * \param init_data The initial data to put into the cache. Ignored if
init_fill_count is
+ * less than 0.
+ * \param reserve_shape The shape of cache.
+ * \param init_fill_count The initial row to fill into
+ * the cache.
+ * \param num_caches Number of caches to create.
+ */
+Array<AttentionKVCache> CreateMultipleKVCaches(NDArray init_data, ShapeTuple
reserve_shape,
+ int init_fill_count, int
num_caches) {
+ DLDataType dtype = init_data->dtype;
+
+ int64_t cache_size = (dtype.bits * dtype.lanes + 7) / 8;
Review Comment:
I think it is fine for now. Since subbyte are usually packed manually(the
dtype is i32)
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]