[
https://issues.apache.org/jira/browse/GOBBLIN-1343?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Zihan Li resolved GOBBLIN-1343.
-------------------------------
Resolution: Fixed
> Fix the data loss issue caused by the cache expiration in
> PartitionerDataWriter
> -------------------------------------------------------------------------------
>
> Key: GOBBLIN-1343
> URL: https://issues.apache.org/jira/browse/GOBBLIN-1343
> Project: Apache Gobblin
> Issue Type: Task
> Reporter: Zihan Li
> Priority: Major
>
> Problem statement:
> Previously, we maintain a cache in PartitionedDataWriter to avoid accumulate
> writer in memory in long running job. But when we expire the writer, we only
> close it without flush/commit, so it may cause data loss when there is a
> slowness happening on HDFS.
>
> Potential solution:
> # In the removal logic, we can make sure the writer has been committed
> correctly, i.e. force it to commit before close. But the issue here is we
> still remove the writer from cache, so next flush message will be handled and
> return without call commit for the right writer, and watermark will move
> without data being published to HDFS.
> # We calculate the time for the write operation, and if it takes a long
> time, we force to add the writer back to cache so that next flush message
> will be picked up by the writer.
> Here we use the second solution
--
This message was sent by Atlassian Jira
(v8.3.4#803005)