[
https://issues.apache.org/jira/browse/FLINK-12886?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Flink Jira Bot updated FLINK-12886:
-----------------------------------
Labels: auto-unassigned pull-request-available stale-major (was:
auto-unassigned pull-request-available)
I am the [Flink Jira Bot|https://github.com/apache/flink-jira-bot/] and I help
the community manage its development. I see this issues has been marked as
Major but is unassigned and neither itself nor its Sub-Tasks have been updated
for 30 days. I have gone ahead and added a "stale-major" to the issue". If this
ticket is a Major, please either assign yourself or give an update. Afterwards,
please remove the label or in 7 days the issue will be deprioritized.
> Support container memory segment
> --------------------------------
>
> Key: FLINK-12886
> URL: https://issues.apache.org/jira/browse/FLINK-12886
> Project: Flink
> Issue Type: New Feature
> Components: Table SQL / Runtime
> Reporter: Liya Fan
> Priority: Major
> Labels: auto-unassigned, pull-request-available, stale-major
> Attachments: image-2019-06-18-17-59-42-136.png
>
> Time Spent: 10m
> Remaining Estimate: 0h
>
> We observe that in many scenarios, the operations/algorithms are based on an
> array of MemorySegment. These memory segments form a large, combined, and
> continuous memory space.
> For example, suppose we have an array of n memory segments. Memory addresses
> from 0 to segment_size - 1 are served by the first memory segment; memory
> addresses from segment_size to 2 * segment_size - 1 are served by the second
> memory segment, and so on.
> Specific algorithms decide the actual MemorySegment to serve the operation
> requests. For some rare cases, two or more memory segments serve the
> requests. There are many operations based on such a paradigm, for example,
> {{BinaryString#matchAt}}, {{SegmentsUtil#copyToBytes}},
> {{LongHashPartition#MatchIterator#get}}, etc.
> The problem is that, for memory segment array based operations, large amounts
> of code is devoted to
> 1. Computing the memory segment index & offset within the memory segment.
> 2. Processing boundary cases. For example, to write an integer, there are
> only 2 bytes left in the first memory segment, and the remaining 2 bytes must
> be written to the next memory segment.
> 3. Differentiate processing for short/long data. For example, when copying
> memory data to a byte array. Different methods are implemented for cases when
> 1) the data fits in a single segment; 2) the data spans multiple segments.
> Therefore, there are much duplicated code to achieve above purposes. What is
> worse, this paradigm significantly increases the amount of code, making the
> code more difficult to read and maintain. Furthermore, it easily gives rise
> to bugs which difficult to find and debug.
> To address these problems, we propose a new type of memory segment:
> {{ContainerMemorySegment}}. It is based on an array of underlying memory
> segments with the same size. It extends from the {{MemorySegment}} base
> class, so it provides all the functionalities provided by {{MemorySegment}}.
> In addition, it hides all the details for dealing with specific memory
> segments, and acts as if it were a big continuous memory region.
> A prototype implementation is given below:
> !image-2019-06-18-17-59-42-136.png|thumbnail!
> With this new type of memory segment, many operations/algorithms can be
> greatly simplified, without affecting performance. This is because,
> 1. Many checks, boundary processing are already there. We just move them to
> the new class.
> 2. We optimize the implementation of the new class, so the special
> optimizations (e.g. optimizations for short data) are still preserved.
--
This message was sent by Atlassian Jira
(v8.3.4#803005)