[
https://issues.apache.org/jira/browse/FLINK-11775?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16815190#comment-16815190
]
Jingsong Lee commented on FLINK-11775:
--------------------------------------
I think my goal is to optimize the serialization of BinaryRow, which currently
occurs on two views:
1. AbstractPagedOutputView: In Sort, HashTable, etc.
2. DataOutputSerializer: (Because bytes is saved to byte[] in
DataOutputSerializer, it can be directly copied from MemorySegment.)
Scenario 1: It happened in RecordWriter and is about to be sent to the network.
Scenario 2: In the serialization of RocksDBValueState.
My original intention was to optimize the serialization of BinaryRow on both
views.
The current idea is:
Let AbstractPagedOutputView and DataOutputSerializer implement
MemorySegmentWritable.
In AbstractPagedOutputView, implement write(MemorySegment segment, int off, int
len) to use MemorySegment.copyTo(MemorySegment)
In DataOutputSerializer, implement write(MemorySegment segment, int off, int
len) to use MemorySegment.get(byte[])
Then in BinaryRowSerializer.serialize(), if the outputView isInstanceOf
MemorySegmentWritable, call write(MemorySegment), or whether it is serialized
using the DataOutputView interface.
Thanks [~srichter] and [~StephanEwen] and [~pnowojski] for your advice:
1.let DataOutputView implement MemorySegmentWritable is a bad idea. Not every
DataOutputView has the ability to deal directly with MemorySegment.
2.keep MemorySegmentWritable as internal is good. Only our Table can touch it.
> Introduce MemorySegmentWritable to let DataOutputView direct copy to internal
> bytes
> -----------------------------------------------------------------------------------
>
> Key: FLINK-11775
> URL: https://issues.apache.org/jira/browse/FLINK-11775
> Project: Flink
> Issue Type: New Feature
> Components: Runtime / Operators
> Reporter: Jingsong Lee
> Assignee: Jingsong Lee
> Priority: Major
>
> Blink new binary format is based on MemorySegment.
> Introduce MemorySegmentWritable to let DataOutputView direct copy to internal
> bytes
> {code:java}
> /**
> * Provides the interface for write(Segment).
> */
> public interface MemorySegmentWritable {
> /**
> * Writes {@code len} bytes from memory segment {@code segment} starting at
> offset {@code off}, in order,
> * to the output.
> *
> * @param segment memory segment to copy the bytes from.
> * @param off the start offset in the memory segment.
> * @param len The number of bytes to copy.
> * @throws IOException if an I/O error occurs.
> */
> void write(MemorySegment segment, int off, int len) throws IOException;
> }{code}
>
> If we want to write a Memory Segment to DataOutputView, we need to copy bytes
> to byte[] and then write it in, which is less effective.
> If we let AbstractPagedOutputView have a write(MemorySegment) interface, we
> can copy it directly.
> We need to ensure this in network serialization, batch operator calculation
> serialization, Streaming State serialization to avoid new byte[] and copy.
--
This message was sent by Atlassian JIRA
(v7.6.3#76005)