[ 
https://issues.apache.org/jira/browse/FLINK-13702?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16963600#comment-16963600
 ] 

Jingsong Lee commented on FLINK-13702:
--------------------------------------

Thanks [~jark] 's input. It's true that there are multiple threads to read in 
this new scenario. I think we can talk about this ticket thoroughly and then 
look at FLINK-13740. Because even adding synchronize can't solve the problem of 
FLINK-13740, so I maintain my view that FLINK-13740 is another problem.

Hi [~dwysakowicz], I agree correctness is more important than performance. Not 
only long udfs chain, but also operator chain. Operator chain make thing more 
complex. We have tried to use CodeGen to solve the chain problem before, but 
the solution is very complex, and we can't solve the problem in a long time. We 
have also questioned whether it is worth introducing such a complex solution to 
solve this problem, which will lead to increased code complexity.

Back to this problem, the essence of this problem is that we now introduce a 
set of thread unsafe lazy initialization.

A simple solution is add synchronize to materialize.

I'll consider the second solution:

Consider our binary segments are immutable object and idempotent, The root 
cause of thread unsafe is that it has three fields: segments, offset, 
sizeInBytes. So if we introduce a Binary immutable object to represent it, we 
can do like this:
{code:java}
public void ensureMaterialized() {
   if (binary == null) {
      binary = materialize();
   }
}

public Binary materialize() {
   byte[] bytes = StringUtf8Utils.encodeUTF8(javaObject);
   return new Binary(new MemorySegment[] {MemorySegmentFactory.wrap(bytes)}, 0, 
bytes.length);
}

public static final class Binary {
   public final MemorySegment[] segments;
   public final int offset;
   public final int sizeInBytes;

   public Binary(MemorySegment[] segments, int offset, int sizeInBytes) {
      this.segments = segments;
      this.offset = offset;
      this.sizeInBytes = sizeInBytes;
   }
}
{code}
In this case, binary is like the hash field of JDK String, which is a thread 
safe lazy initialization field. The cost is just a java object, it is small.

What do you think? Does it address your concerns? 

> BaseMapSerializerTest.testDuplicate fails on Travis
> ---------------------------------------------------
>
>                 Key: FLINK-13702
>                 URL: https://issues.apache.org/jira/browse/FLINK-13702
>             Project: Flink
>          Issue Type: Bug
>          Components: Table SQL / Planner
>    Affects Versions: 1.10.0
>            Reporter: Till Rohrmann
>            Assignee: Dawid Wysakowicz
>            Priority: Critical
>              Labels: test-stability
>
> The {{BaseMapSerializerTest.testDuplicate}} fails on Travis with an 
> {{java.lang.IndexOutOfBoundsException}}.
> https://api.travis-ci.org/v3/job/570973199/log.txt



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

Reply via email to