Zihan Li created GOBBLIN-1343:
---------------------------------
Summary: Fix the data loss issue caused by the cache expiration in
PartitionerDataWriter
Key: GOBBLIN-1343
URL: https://issues.apache.org/jira/browse/GOBBLIN-1343
Project: Apache Gobblin
Issue Type: Task
Reporter: Zihan Li
Problem statement:
Previously, we maintain a cache in PartitionedDataWriter to avoid accumulate
writer in memory in long running job. But when we expire the writer, we only
close it without flush/commit, so it may cause data loss when there is a
slowness happening on HDFS.
Potential solution:
# In the removal logic, we can make sure the writer has been committed
correctly, i.e. force it to commit before close. But the issue here is we
still remove the writer from cache, so next flush message will be handled and
return without call commit for the right writer, and watermark will move
without data being published to HDFS.
# We calculate the time for the write operation, and if it takes a long time,
we force to add the writer back to cache so that next flush message will be
picked up by the writer.
Here we use the second solution
--
This message was sent by Atlassian Jira
(v8.3.4#803005)