[ https://issues.apache.org/jira/browse/APEXMALHAR-2190?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15573426#comment-15573426 ]
bright chen commented on APEXMALHAR-2190: ----------------------------------------- Add new features - Add ByteStreamMonitor to monitor all byte stream. function include getTotalSize(), getTotalCapacity(), listStreamSize(), listStreamCapacity() - Add interface BlocksAdjustStrategy and its implementation DefaultBlocksAdjustStrategy to adjust(release) the extra memory - Add dataSizeUpToWindow() and dataSizeOfWindow() in WindowableBlocksStream to support query data size of windows from other threads - Modified Bucket.DefaultBucket.freeMemory(long) to calculate the size of the data can be freed and send reset quest to the main thread > Use reusable buffer to serial spillable data structure > ------------------------------------------------------ > > Key: APEXMALHAR-2190 > URL: https://issues.apache.org/jira/browse/APEXMALHAR-2190 > Project: Apache Apex Malhar > Issue Type: Task > Reporter: bright chen > Assignee: bright chen > Original Estimate: 240h > Remaining Estimate: 240h > > Spillable Data Structure created lots of temporary memory to serial data lot > of of memory copy( see SliceUtils.concatenate(byte[], byte[]). Which used up > memory very quickly. See APEXMALHAR-2182. > Use a shared memory to avoid allocate temporary memory and memory copy > some basic ideas > - SerToLVBuffer interface provides a method serTo(T object, LengthValueBuffer > buffer): instead of create a memory and then return the serialized data, this > method let the caller pass in the buffer. So different objects or object with > embed objects can share the same LengthValueBuffer > - LengthValueBuffer: It is a buffer which manage the memory as length and > value(which is the generic format of serialized data). which provide length > placeholder mechanism to avoid temporary memory and data copy when the length > can be know after data serialized > - memory management classes: includes interface ByteStream and it's > implementations: Block, FixedBlock, BlocksStream. Which provides a mechanism > to dynamic allocate and manage memory. Which basically provides following > function. I tried other some other stream mechamism such as > ByteArrayInputStream, but it can meet 3rd criteria, and don't have good > performance(50% loss) > - dynamic allocate memory > - reset memory for reuse > - BlocksStream make sure the output slices will not be changed when need > extra memory; Block can change the reference of output slices buffer is data > was moved due to reallocate of memory(BlocksStream is better solution). > - WindowableBlocksStream extends from BlocksStream and provides function to > reset memory window by window instead of reset all memory. It provides > certain amount of cache( as bytes ) in memory -- This message was sent by Atlassian JIRA (v6.3.4#6332)