[
https://issues.apache.org/jira/browse/SPARK-35396?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Chendi.Xue updated SPARK-35396:
-------------------------------
Description:
This PR is proposing a add-on to support to manual close entries in MemoryStore
and InMemoryRelation
h3. What changes were proposed in this pull request?
Currently:
MemoryStore uses a LinkedHashMap[BlockId, MemoryEntry[_]] to store all OnHeap
or OffHeap entries.
And when memoryStore.remove(blockId) is called, codes will simply remove one
entry from LinkedHashMap and leverage Java GC to do release work.
This PR:
We are proposing a add-on to manually close any object stored in MemoryStore
and InMemoryRelation if this object is extended from AutoCloseable.
Veifiication:
In our own use case, we implemented a user-defined off-heap-hashRelation for
BHJ, and we verified that by adding this manual close, we can make sure our
defined off-heap-hashRelation can be released when evict is called.
Also, we implemented user-defined cachedBatch and will be release when
InMemoryRelation.clearCache() is called by this PR
h3. Why are the changes needed?
This changes can help to clean some off-heap user-defined object may be cached
in InMemoryRelation or MemoryStore
h3. Does this PR introduce _any_ user-facing change?
NO
h3. How was this patch tested?
WIP
Signed-off-by: Chendi Xue [[email protected]|mailto:[email protected]]
was:
Current MemoryStore uses a LinkedHashMap[BlockId, MemoryEntry[_]] to store all
OnHeap or OffHeap entries.
And when memoryStore.remove(blockId) is called, codes will simply remove one
entry from LinkedHashMap and leverage Java GC to do release work.
We are proposing a add-on, if this object is extends from AutoCloseable, then
we can call this object's close() directly in MemoryStore.clear() and
MemoryStore.remove() function.
In our case, we are implementing a user-defined off-heap-hashRelation for BHJ,
and we verified that by adding this manual close, we can make sure our defined
off-heap-hashRelation can be released when evict is called.
At same, time, we also want to add this logic to InMemoryRelation to manual
close a user-defined CachedBatch
> Support to manual close entries in MemoryStore and InMemoryRelation
> -------------------------------------------------------------------
>
> Key: SPARK-35396
> URL: https://issues.apache.org/jira/browse/SPARK-35396
> Project: Spark
> Issue Type: New Feature
> Components: Spark Core, SQL
> Affects Versions: 3.1.1
> Reporter: Chendi.Xue
> Priority: Major
>
> This PR is proposing a add-on to support to manual close entries in
> MemoryStore and InMemoryRelation
> h3. What changes were proposed in this pull request?
> Currently:
> MemoryStore uses a LinkedHashMap[BlockId, MemoryEntry[_]] to store all OnHeap
> or OffHeap entries.
> And when memoryStore.remove(blockId) is called, codes will simply remove one
> entry from LinkedHashMap and leverage Java GC to do release work.
> This PR:
> We are proposing a add-on to manually close any object stored in MemoryStore
> and InMemoryRelation if this object is extended from AutoCloseable.
> Veifiication:
> In our own use case, we implemented a user-defined off-heap-hashRelation for
> BHJ, and we verified that by adding this manual close, we can make sure our
> defined off-heap-hashRelation can be released when evict is called.
> Also, we implemented user-defined cachedBatch and will be release when
> InMemoryRelation.clearCache() is called by this PR
> h3. Why are the changes needed?
> This changes can help to clean some off-heap user-defined object may be
> cached in InMemoryRelation or MemoryStore
> h3. Does this PR introduce _any_ user-facing change?
> NO
> h3. How was this patch tested?
> WIP
> Signed-off-by: Chendi Xue [[email protected]|mailto:[email protected]]
--
This message was sent by Atlassian Jira
(v8.3.4#803005)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]