[ https://issues.apache.org/jira/browse/APEXMALHAR-2190?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15513930#comment-15513930 ]
ASF GitHub Bot commented on APEXMALHAR-2190: -------------------------------------------- GitHub user brightchen reopened a pull request: https://github.com/apache/apex-malhar/pull/404 APEXMALHAR-2190 #resolve #comment Use reusable buffer to serial spill… …able data structure You can merge this pull request into a Git repository by running: $ git pull https://github.com/brightchen/apex-malhar APEXMALHAR-2190-PR Alternatively you can review and apply these changes as the patch at: https://github.com/apache/apex-malhar/pull/404.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #404 ---- commit b17ca37cf7f0ae8037ce69206e9ae914f8168d33 Author: brightchen <bri...@datatorrent.com> Date: 2016-08-16T00:46:27Z APEXMALHAR-2190 #resolve #comment Use reusable buffer to serial spillable data structure ---- > Use reusable buffer to serial spillable data structure > ------------------------------------------------------ > > Key: APEXMALHAR-2190 > URL: https://issues.apache.org/jira/browse/APEXMALHAR-2190 > Project: Apache Apex Malhar > Issue Type: Task > Reporter: bright chen > Assignee: bright chen > Original Estimate: 240h > Remaining Estimate: 240h > > Spillable Data Structure created lots of temporary memory to serial data lot > of of memory copy( see SliceUtils.concatenate(byte[], byte[]). Which used up > memory very quickly. See APEXMALHAR-2182. > Use a shared memory to avoid allocate temporary memory and memory copy > some basic ideas > - SerToLVBuffer interface provides a method serTo(T object, LengthValueBuffer > buffer): instead of create a memory and then return the serialized data, this > method let the caller pass in the buffer. So different objects or object with > embed objects can share the same LengthValueBuffer > - LengthValueBuffer: It is a buffer which manage the memory as length and > value(which is the generic format of serialized data). which provide length > placeholder mechanism to avoid temporary memory and data copy when the length > can be know after data serialized > - memory management classes: includes interface ByteStream and it's > implementations: Block, FixedBlock, BlocksStream. Which provides a mechanism > to dynamic allocate and manage memory. Which basically provides following > function. I tried other some other stream mechamism such as > ByteArrayInputStream, but it can meet 3rd criteria, and don't have good > performance(50% loss) > - dynamic allocate memory > - reset memory for reuse > - BlocksStream make sure the output slices will not be changed when need > extra memory; Block can change the reference of output slices buffer is data > was moved due to reallocate of memory(BlocksStream is better solution). > - WindowableBlocksStream extends from BlocksStream and provides function to > reset memory window by window instead of reset all memory. It provides > certain amount of cache( as bytes ) in memory -- This message was sent by Atlassian JIRA (v6.3.4#6332)