[ 
https://issues.apache.org/jira/browse/FLINK-23886?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17403501#comment-17403501
 ] 

xiangqiao commented on FLINK-23886:
-----------------------------------

RocksDBCachingPriorityQueueSet#deserializeElement method will share thread 
unsafe DataInputDeserializer inputview, but this method will be called by 
multiple threads, such as registerProcessingTimeTimer method of task thread and 
onProcessingTime method of processing time trigger callback thread, Similarly, 
the RocksDBCachingPriorityQueueSet#serializeElement method may have the same 
problem。
 
Hello [~yunta] , can you help confirm whether this problem exists?

 

InternalTimerServiceImpl#registerProcessingTimeTimer
{code:java}
@Override
public void registerProcessingTimeTimer(N namespace, long time) {
   InternalTimer<K, N> oldHead = processingTimeTimersQueue.peek();
   if (processingTimeTimersQueue.add(new TimerHeapInternalTimer<>(time, (K) 
keyContext.getCurrentKey(), namespace))) {
      long nextTriggerTime = oldHead != null ? oldHead.getTimestamp() : 
Long.MAX_VALUE;
      // check if we need to re-schedule our timer to earlier
      if (time < nextTriggerTime) {
         if (nextTimer != null) {
            nextTimer.cancel(false);
         }
         nextTimer = processingTimeService.registerTimer(time, 
this::onProcessingTime);
      }
   }
}

private void onProcessingTime(long time) throws Exception {
   // null out the timer in case the Triggerable calls 
registerProcessingTimeTimer()
   // inside the callback.
   nextTimer = null;

   InternalTimer<K, N> timer;

   while ((timer = processingTimeTimersQueue.peek()) != null && 
timer.getTimestamp() <= time) {
      processingTimeTimersQueue.poll();
      keyContext.setCurrentKey(timer.getKey());
      triggerTarget.onProcessingTime(timer);
   }

   if (timer != null && nextTimer == null) {
      nextTimer = processingTimeService.registerTimer(timer.getTimestamp(), 
this::onProcessingTime);
   }
}
{code}
 

RocksDBCachingPriorityQueueSet#deserializeElement:
{code:java}
@Nonnull
private E deserializeElement(@Nonnull byte[] bytes) {
   try {
      final int numPrefixBytes = groupPrefixBytes.length;
      inputView.setBuffer(bytes, numPrefixBytes, bytes.length - numPrefixBytes);
      return byteOrderProducingSerializer.deserialize(inputView);
   } catch (IOException e) {
      throw new FlinkRuntimeException("Error while deserializing the element.", 
e);
   }
}
{code}

> An exception is thrown out when recover job timers from checkpoint file
> -----------------------------------------------------------------------
>
>                 Key: FLINK-23886
>                 URL: https://issues.apache.org/jira/browse/FLINK-23886
>             Project: Flink
>          Issue Type: Bug
>          Components: Runtime / State Backends
>            Reporter: JING ZHANG
>            Priority: Major
>
> A user report the bug in the [mailist. 
> |http://mail-archives.apache.org/mod_mbox/flink-user/202108.mbox/%3ccakmsf43j14nkjmgjuy4dh5qn2vbjtw4tfh4pmmuyvcvfhgf...@mail.gmail.com%3E]I
>  paste the content here.
> Setup Specifics:
>  Version: 1.6.2
>  RocksDB Map State
>  Timers stored in rocksdb
>   
>  When we have this job running for long periods of time like > 30 days, if 
> for some reason the job restarts, we encounter "Error while deserializing the 
> element". Is this a known issue fixed in later versions? I see some changes 
> to code for FLINK-10175, but we don't use any queryable state 
>   
>  Below is the stack trace
>   
>  org.apache.flink.util.FlinkRuntimeException: Error while deserializing the 
> element.
> at 
> org.apache.flink.contrib.streaming.state.RocksDBCachingPriorityQueueSet.deserializeElement(RocksDBCachingPriorityQueueSet.java:389)
> at 
> org.apache.flink.contrib.streaming.state.RocksDBCachingPriorityQueueSet.peek(RocksDBCachingPriorityQueueSet.java:146)
> at 
> org.apache.flink.contrib.streaming.state.RocksDBCachingPriorityQueueSet.peek(RocksDBCachingPriorityQueueSet.java:56)
> at 
> org.apache.flink.runtime.state.heap.KeyGroupPartitionedPriorityQueue$InternalPriorityQueueComparator.comparePriority(KeyGroupPartitionedPriorityQueue.java:274)
> at 
> org.apache.flink.runtime.state.heap.KeyGroupPartitionedPriorityQueue$InternalPriorityQueueComparator.comparePriority(KeyGroupPartitionedPriorityQueue.java:261)
> at 
> org.apache.flink.runtime.state.heap.HeapPriorityQueue.isElementPriorityLessThen(HeapPriorityQueue.java:164)
> at 
> org.apache.flink.runtime.state.heap.HeapPriorityQueue.siftUp(HeapPriorityQueue.java:121)
> at 
> org.apache.flink.runtime.state.heap.HeapPriorityQueue.addInternal(HeapPriorityQueue.java:85)
> at 
> org.apache.flink.runtime.state.heap.AbstractHeapPriorityQueue.add(AbstractHeapPriorityQueue.java:73)
> at 
> org.apache.flink.runtime.state.heap.KeyGroupPartitionedPriorityQueue.<init>(KeyGroupPartitionedPriorityQueue.java:89)
> at 
> org.apache.flink.contrib.streaming.state.RocksDBKeyedStateBackend$RocksDBPriorityQueueSetFactory.create(RocksDBKeyedStateBackend.java:2792)
> at 
> org.apache.flink.contrib.streaming.state.RocksDBKeyedStateBackend.create(RocksDBKeyedStateBackend.java:450)
> at 
> org.apache.flink.streaming.api.operators.InternalTimeServiceManager.createTimerPriorityQueue(InternalTimeServiceManager.java:121)
> at 
> org.apache.flink.streaming.api.operators.InternalTimeServiceManager.registerOrGetTimerService(InternalTimeServiceManager.java:106)
> at 
> org.apache.flink.streaming.api.operators.InternalTimeServiceManager.getInternalTimerService(InternalTimeServiceManager.java:87)
> at 
> org.apache.flink.streaming.api.operators.AbstractStreamOperator.getInternalTimerService(AbstractStreamOperator.java:764)
> at 
> org.apache.flink.streaming.api.operators.KeyedProcessOperator.open(KeyedProcessOperator.java:61)
> at 
> org.apache.flink.streaming.runtime.tasks.StreamTask.openAllOperators(StreamTask.java:424)
> at 
> org.apache.flink.streaming.runtime.tasks.StreamTask.invoke(StreamTask.java:290)
> at org.apache.flink.runtime.taskmanager.Task.run(Task.java:711)
> at java.lang.Thread.run(Thread.java:748)
> Caused by: java.io.EOFException
> at java.io.DataInputStream.readUnsignedByte(DataInputStream.java:290)
> at org.apache.flink.types.StringValue.readString(StringValue.java:769)
> at 
> org.apache.flink.api.common.typeutils.base.StringSerializer.deserialize(StringSerializer.java:69)
> at 
> org.apache.flink.api.common.typeutils.base.StringSerializer.deserialize(StringSerializer.java:28)
> at 
> org.apache.flink.api.java.typeutils.runtime.RowSerializer.deserialize(RowSerializer.java:179)
> at 
> org.apache.flink.api.java.typeutils.runtime.RowSerializer.deserialize(RowSerializer.java:46)
> at 
> org.apache.flink.streaming.api.operators.TimerSerializer.deserialize(TimerSerializer.java:168)
> at 
> org.apache.flink.streaming.api.operators.TimerSerializer.deserialize(TimerSerializer.java:45)
> at 
> org.apache.flink.contrib.streaming.state.RocksDBCachingPriorityQueueSet.deserializeElement(RocksDBCachingPriorityQueueSet.java:387)
> ... 20 more



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

Reply via email to