[
https://issues.apache.org/jira/browse/NIFI-3686?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15967779#comment-15967779
]
Mark Payne commented on NIFI-3686:
----------------------------------
[~mosermw] The FileSystemSwapManager writes the contents of a swap file to a
temp file, then performs an fsync, and finally renames the file. So there
should be no way to get an EOFException unless the file in actually corrupt -
it should not be due to the contents not being completely written out. I tried
to replicate the behavior locally by creating a 100 MB partition and putting
the FlowFile repo there, but I wasn't able to replicate. So just saying that,
to say that there may be more to this story than simply running out of disk
space and not being able to finish writing the file.
In any case, though, I think when we are swapping it in, we should not assume
that an EOFException would dictate that we can lose all FlowFiles. We need to
ensure that we are able to recover those FlowFiles that we can. Unfortunately,
looking at it now, it looks like the schema that we are using has a single
element named "FlowFiles" and the Swap File is expected to consist of a single
"Record." We'd need to update the schema so that it allows each FlowFile to be
written as a separate Record. The downside is that the schema would be
incompatible. So we could still remain backward compatible but would lost
"forward compatibility" -- meaning that if a Swap File gets written in the new
format we won't be able to recover that swap file if we rolled back to an old
version of NiFi...
> EOFException on swap in causes tight loop in polling for flowfiles
> ------------------------------------------------------------------
>
> Key: NIFI-3686
> URL: https://issues.apache.org/jira/browse/NIFI-3686
> Project: Apache NiFi
> Issue Type: Bug
> Components: Core Framework
> Affects Versions: 1.1.1
> Reporter: Michael Moser
>
> If flowfile_repository partition fills 100% while swapping files out to a new
> swap file, then this swap file becomes corrupt (partially written). When
> NiFi tries to swap this file in, EOFException happens and we get following
> ERROR, which is nice.
> 2017-04-10 18:02:58,855 ERROR [Timer-Driven Process Thread-3]
> o.a.n.controller.StandardFlowFileQueue Failed to swap in FlowFiles from Swap
> File
> /local/mwmoser/nifi-1.2.0-SNAPSHOT/./flowfile_repository/swap/1491574631605-2840b630-57fc-4f49-615b-0b37d77bec66-5dbc0ad0-921c-483e-a05d-5c65d014fa48.swap;
> Swap File appears to be corrupt!
> However, once all other dataflow stops, the queue now shows 10000 flowfiles
> in it. The processor reading from this queue constantly has its onTrigger()
> called, and session.get() polls the queue and gets 0 files returned. This
> happens in a tight loop, with no other errors.
> To a user it appears that the processor is doing lots of work but just not
> processing those 10000 files. The error message above only appears once in
> the nifi-app.log, so you don't see anything wrong if you tail the log.
> When you restart NiFi, the error message above appears again, but the user
> experience of 10000 files not processing remains.
> The new SchemaSwapDeserializer does not (and perhaps cannot) implement the
> IncompleteSwapFileException that the old SimpleSwapDeserializer does. So,
> reading a swap file is currently all-or-nothing.
--
This message was sent by Atlassian JIRA
(v6.3.15#6346)