[
https://issues.apache.org/jira/browse/FLINK-13702?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16963838#comment-16963838
]
Dawid Wysakowicz edited comment on FLINK-13702 at 10/31/19 10:23 AM:
---------------------------------------------------------------------
I agree the solution suggested by [~lzljs3620320] should work, it is very
fragile though. It is based on the assumption that {{MemorySegments}} are
immutable which in general case is not true. You can mutate MemorySegment via
put methods. It is also possible that with this solution different code paths
might work against different MemorySegments for the same {{LazyBinaryFormat}}
instances. It does improve the current situation, I agree. We can add that
solution, but we should keep in mind it is basically a hack and add a proper
documentation with the assumptions that we do there.
I was also thinking if it would make sense to make {{MemorySegment[]
getSegments}} return {{UnmodifiableMemorySegment}}. This would require more
changes, as we would have to extract {{MemorySegment}} interface. Still that
would not make the {{MemorySegment}} fully immutable as it still gives direct
access to underlying memory array via {{getHeapMemory}}.
I would keep mind open for a better solution in a long run though.
As for the BinaryGeneric I think we could:
* remove the {{copy}} method from the class and move it to the {{Serializer}}
as [~lzljs3620320] was suggesting
* in the {{materialize}} method call {{duplicate}}. This would ensure that
every caller thread uses its own copy of the serializer.
* additionally we can think of keeping {{TypeSerializerSnapshot}} instead of
{{Serializer} and call {{restoreSerializer}} in the {{materialize}} method.
Which should create a new copy of a Serializer. Unfortunately thats not the
case for some of the {{Serializer}}s in Blink planner (this should be fixed
independently anyway). To be on the safe side we would still call duplicate.
What do you think?
was (Author: dawidwys):
I agree the solution suggested by [~lzljs3620320] should work, it is very
fragile though. It is based on the assumption that {{MemorySegments}} are
immutable which in general case is not true. You can mutate MemorySegment via
put methods. It is also possible that with this solution different code paths
might work against different MemorySegments for the same {{LazyBinaryFormat}}
instances. It does improve the current situation, I agree. We can add that
solution, but we should keep in mind it is basically a hack and add a proper
documentation with the assumptions that we do there.
I was also thinking if it would make sense to make {{MemorySegment[]
getSegments}} return {{UnmodifiableMemorySegment}}. This would require more
changes, as we would have to extract {{MemorySegment}} interface. Still that
would not make the {{MemorySegment}} fully immutable as it still gives direct
access to underlying memory array via {{getHeapMemory}}.
As for the BinaryGeneric I think we could:
* remove the {{copy}} method from the class and move it to the {{Serializer}}
as [~lzljs3620320] was suggesting
* in the {{materialize}} method call {{duplicate}}. This would ensure that
every caller thread uses its own copy of the serializer.
> BaseMapSerializerTest.testDuplicate fails on Travis
> ---------------------------------------------------
>
> Key: FLINK-13702
> URL: https://issues.apache.org/jira/browse/FLINK-13702
> Project: Flink
> Issue Type: Bug
> Components: Table SQL / Planner
> Affects Versions: 1.10.0
> Reporter: Till Rohrmann
> Assignee: Dawid Wysakowicz
> Priority: Critical
> Labels: test-stability
>
> The {{BaseMapSerializerTest.testDuplicate}} fails on Travis with an
> {{java.lang.IndexOutOfBoundsException}}.
> https://api.travis-ci.org/v3/job/570973199/log.txt
--
This message was sent by Atlassian Jira
(v8.3.4#803005)