[
https://issues.apache.org/jira/browse/APEXMALHAR-2190?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
bright chen updated APEXMALHAR-2190:
------------------------------------
Description:
Spillable Data Structure created lots of temporary memory to serial data lot of
of memory copy( see SliceUtils.concatenate(byte[], byte[]). Which used up
memory very quickly. See APEXMALHAR-2182.
Use a shared memory to avoid allocate temporary memory and memory copy
some basic ideas
- SerToLVBuffer interface provides a method serTo(T object, LengthValueBuffer
buffer): instead of create a memory and then return the serialized data, this
method let the caller pass in the buffer. So different objects or object with
embed objects can share the same LengthValueBuffer
- LengthValueBuffer: It is a buffer which manage the memory as length and
value(which is the generic format of serialized data). which provide length
placeholder mechanism to avoid temporary memory and data copy when the length
can be know after data serialized
- memory management classes: includes interface ByteStream and it's
implementations: Block, FixedBlock, BlocksStream. Which provides a mechanism to
dynamic allocate and manage memory. Which basically provides following
function. I tried other some other stream mechamism such as
ByteArrayInputStream, but it can meet 3rd criteria, and don't have good
performance(50% loss)
- dynamic allocate memory
- reset memory for reuse
- BlocksStream make sure the output slices will not be changed when need
extra memory; Block can change the reference of output slices buffer is data
was moved due to reallocate of memory(BlocksStream is better solution).
- WindowableBlocksStream extends from BlocksStream and provides function to
reset memory window by window instead of reset all memory. It provides certain
amount of cache( as bytes ) in memory
was:
Spillable Data Structure created lots of temporary memory to serial data lot of
of memory copy( see SliceUtils.concatenate(byte[], byte[]). Which used up
memory very quickly. See APEXMALHAR-2182.
Use a shared memory to avoid allocate temporary memory and memory copy
some basic ideas
- SerToLVBuffer interface provides a method serTo(T object, LengthValueBuffer
buffer): instead of create a memory and then return the serialized data, this
method let the caller pass in the buffer. So different objects or object with
embed objects can share the same LengthValueBuffer
- LengthValueBuffer: It is a buffer which manage the memory as length and
value(which is the generic format of serialized data). which provide length
placeholder mechanism to avoid temporary memory and data copy when the length
can be know after data serialized
- memory management classes: includes interface ByteStream and it's
implementations: Block, FixedBlock, BlocksStream. Which provides a mechanism to
dynamic allocate and manage memory. Which basically provides following
function. I tried other some other stream mechamism such as
ByteArrayInputStream, but it can meet 3rd criteria, and don't have good
performance(50% loss)
- dynamic allocate memory
- reset memory for reuse
- BlocksStream make sure the output slices will not be changed when need
extra memory; Block can change the reference of output slices buffer is data
was moved due to reallocate of memory(BlocksStream is better solution).
> Use reusable buffer to serial spillable data structure
> ------------------------------------------------------
>
> Key: APEXMALHAR-2190
> URL: https://issues.apache.org/jira/browse/APEXMALHAR-2190
> Project: Apache Apex Malhar
> Issue Type: Task
> Reporter: bright chen
> Assignee: bright chen
> Original Estimate: 240h
> Remaining Estimate: 240h
>
> Spillable Data Structure created lots of temporary memory to serial data lot
> of of memory copy( see SliceUtils.concatenate(byte[], byte[]). Which used up
> memory very quickly. See APEXMALHAR-2182.
> Use a shared memory to avoid allocate temporary memory and memory copy
> some basic ideas
> - SerToLVBuffer interface provides a method serTo(T object, LengthValueBuffer
> buffer): instead of create a memory and then return the serialized data, this
> method let the caller pass in the buffer. So different objects or object with
> embed objects can share the same LengthValueBuffer
> - LengthValueBuffer: It is a buffer which manage the memory as length and
> value(which is the generic format of serialized data). which provide length
> placeholder mechanism to avoid temporary memory and data copy when the length
> can be know after data serialized
> - memory management classes: includes interface ByteStream and it's
> implementations: Block, FixedBlock, BlocksStream. Which provides a mechanism
> to dynamic allocate and manage memory. Which basically provides following
> function. I tried other some other stream mechamism such as
> ByteArrayInputStream, but it can meet 3rd criteria, and don't have good
> performance(50% loss)
> - dynamic allocate memory
> - reset memory for reuse
> - BlocksStream make sure the output slices will not be changed when need
> extra memory; Block can change the reference of output slices buffer is data
> was moved due to reallocate of memory(BlocksStream is better solution).
> - WindowableBlocksStream extends from BlocksStream and provides function to
> reset memory window by window instead of reset all memory. It provides
> certain amount of cache( as bytes ) in memory
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)