[
https://issues.apache.org/jira/browse/FLINK-12886?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16868505#comment-16868505
]
Liya Fan commented on FLINK-12886:
----------------------------------
[~lzljs3620320], thanks for your feedback and suggestion.
The original intention of this Jira
# Make the logic for manipulating memory segment array simpler.
# Reduce redundant operations to make the code more readable and easy to
maintain, and less likely to produce bugs.
# Improve performance by reducing code complexity (see e.g.
[https://github.com/apache/flink/pull/8773|https://github.com/apache/flink/pull/8773).])
Maybe I should illustrate the effects with an example in the code.
> Support container memory segment
> --------------------------------
>
> Key: FLINK-12886
> URL: https://issues.apache.org/jira/browse/FLINK-12886
> Project: Flink
> Issue Type: New Feature
> Components: Table SQL / Runtime
> Reporter: Liya Fan
> Assignee: Liya Fan
> Priority: Major
> Labels: pull-request-available
> Attachments: image-2019-06-18-17-59-42-136.png
>
> Time Spent: 10m
> Remaining Estimate: 0h
>
> We observe that in many scenarios, the operations/algorithms are based on an
> array of MemorySegment. These memory segments form a large, combined, and
> continuous memory space.
> For example, suppose we have an array of n memory segments. Memory addresses
> from 0 to segment_size - 1 are served by the first memory segment; memory
> addresses from segment_size to 2 * segment_size - 1 are served by the second
> memory segment, and so on.
> Specific algorithms decide the actual MemorySegment to serve the operation
> requests. For some rare cases, two or more memory segments serve the
> requests. There are many operations based on such a paradigm, for example,
> {{BinaryString#matchAt}}, {{SegmentsUtil#copyToBytes}},
> {{LongHashPartition#MatchIterator#get}}, etc.
> The problem is that, for memory segment array based operations, large amounts
> of code is devoted to
> 1. Computing the memory segment index & offset within the memory segment.
> 2. Processing boundary cases. For example, to write an integer, there are
> only 2 bytes left in the first memory segment, and the remaining 2 bytes must
> be written to the next memory segment.
> 3. Differentiate processing for short/long data. For example, when copying
> memory data to a byte array. Different methods are implemented for cases when
> 1) the data fits in a single segment; 2) the data spans multiple segments.
> Therefore, there are much duplicated code to achieve above purposes. What is
> worse, this paradigm significantly increases the amount of code, making the
> code more difficult to read and maintain. Furthermore, it easily gives rise
> to bugs which difficult to find and debug.
> To address these problems, we propose a new type of memory segment:
> {{ContainerMemorySegment}}. It is based on an array of underlying memory
> segments with the same size. It extends from the {{MemorySegment}} base
> class, so it provides all the functionalities provided by {{MemorySegment}}.
> In addition, it hides all the details for dealing with specific memory
> segments, and acts as if it were a big continuous memory region.
> A prototype implementation is given below:
> !image-2019-06-18-17-59-42-136.png|thumbnail!
> With this new type of memory segment, many operations/algorithms can be
> greatly simplified, without affecting performance. This is because,
> 1. Many checks, boundary processing are already there. We just move them to
> the new class.
> 2. We optimize the implementation of the new class, so the special
> optimizations (e.g. optimizations for short data) are still preserved.
--
This message was sent by Atlassian JIRA
(v7.6.3#76005)