Hi everyone, Some weeks ago we had this conversation about the VolatileContentRepository and the JIRA NIFI-8760 <https://issues.apache.org/jira/browse/NIFI-8760>. This implementation seemed not to be a priority, but my team and probably others could still want to use it, and so I worked on a fix. Let's recall that it the bug was introduced in 1.13.1 when was introduced the ResourceClaim especially for the FileSystemRepository (and it makes perfect sense in that use case since it improves the way several ContentClaim could be read using only one InputStream), and that doesn't seem to work for the VolatileContentRepository.
Today I have two simple fixes equivalent in terms of performance (tested on GenerateFF and MergeRecord, SplitJson, QueryRecord) : - First is to follow the idea of the first implementation <https://github.com/apache/nifi/blob/528fce2407d092d4ced1a58fcc14d0bc6e660b89/nifi-nar-bundles/nifi-framework-bundle/nifi-framework/nifi-framework-core/src/main/java/org/apache/nifi/controller/repository/VolatileContentRepository.java#L473>, that was for a ResourceClaim to call the corresponding ContentClaim at the offset 0. It doesn't work when the searched ContentClaim has a length, because the ContentClaim implements an "equalsTo" that takes the length into account and its constructor called by read(ResourceClaim) initializes it to -1. So a fix could be to search for the ContentClaim in the map matching the ResourceClaim and the offset 0. As I said, even if this implementation seems poor since it does not benefit from the structure of the Map of Comparable keys to search for a ContentClaim, the performance of this solution seems equivalent to the second one. - Second is to simply consider the VolatileContentRepository as non-compatible with the read(ResourceClaim) and to only allow read(ContentClaim) as it is the case for the EncryptedFileSystemRepository. Since the structure of the data storage(s) in this implementation is Map<ContentClaim, ContentBlock>, I lake of experience to answer the question : - Does it make sense to try to use the ResourceClaim to call ContentBlock(s) in case of a VolatileContentRepository ? - If yes, could there be a benefit to call ContentBlock from all the offset matching the ResourceClaim, instead of only the offset 0 as it intended to be ? - Else, the second fix is probably the good one Please don't hesitate to correct me if I'm wrong or misunderstood something. Thank you and have a nice day, Matthieu Le jeu. 5 août 2021 à 12:36, Matthieu Ré <[email protected]> a écrit : > Thank you very much for your answers ! > > That's a surprise ! The VolatileContentRepository seemed to answer > perfectly our need to treat a big amount of data with low resources and > especially low I/O on mounted disks, with non critical data and potential > data loss authorized. > > I just tried your solution @Mark mounting a tmpfs and FileSystemRepository > (on 1.11.4), but it seems like for the same amount of data and same RAM > space used, the VolatileContentRepository used a constant <5% of space, > while the FileSystemRepository was using a very unstable amount of space, > frequently running out of space. (I must add that we don't store any > archive, nifi.content.repository.archive.enabled=false). Maybe am I missing > a configuration that consumes a lot of space with the FileSystemRepository ? > > Stateless NiFi sounds very interesting ! Just had a look at > pvillard's demo (https://github.com/pvillard31/nifi-stateless-demo) and > the framework's readme ( > https://github.com/apache/nifi/tree/master/nifi-nar-bundles/nifi-framework-bundle/nifi-framework/nifi-stateless), > but do you have any more resources about it ? I would like to understand a > little bit more what differs from the standard framework and how it can fit > our use case. > > Have a nice day, > Matthieu > > > Le ven. 23 juil. 2021 à 17:20, Joe Witt <[email protected]> a écrit : > >> It seems like any use case that we previously thought VolatileContentRepo >> would be good for now we'd say Stateless NiFi is a dramatically better >> approach. >> >> We need to doc this better but the capability is there now for sure. >> >> On Fri, Jul 23, 2021 at 8:13 AM Mark Payne <[email protected]> wrote: >> >>> Matthieu, >>> >>> I would highly recommend against using VolatileContentRepository. You’re >>> the first one I’ve heard of using it in a few years. Typically, the >>> FileSystemRepository is sufficient. If you truly want to run with the >>> content in RAM I would recommend creating a RAM Disk and pointing the >>> FileSystemRepository to that. >>> >>> Thanks >>> -Mark >>> >>> >>> On Jul 21, 2021, at 10:31 AM, Matthieu Ré <[email protected]> wrote: >>> >>> Hi Chris, thank you for your quick response >>> >>> I tried the flow with 1.13.2 and 1.13.1, and 1.14.0 just before the >>> first RC and it still had the problem, so I am not sure if this is related >>> to the session handling you pointed out, that has been fixed in 1.13.2 >>> >>> Le mer. 21 juil. 2021 à 16:22, Chris Sampson <[email protected]> >>> a écrit : >>> >>>> 1.13.1 was known to have problems with session handling - see the >>>> Release Note "lowlights" for 1.13.1 [1] >>>> >>>> It is recommended to upgrade to version 1.13.2 (or the latest 1.14.0). >>>> If you can't upgrade then 1.13.0 would be better than 1.13.1. >>>> >>>> >>>> [1] >>>> https://cwiki.apache.org/confluence/display/NIFI/Release+Notes#ReleaseNotes-Version1.13.1 >>>> >>>> --- >>>> *Chris Sampson* >>>> IT Consultant >>>> [email protected] >>>> <https://www.naimuri.com/> >>>> >>>> >>>> On Wed, 21 Jul 2021 at 15:14, Matthieu Ré <[email protected]> >>>> wrote: >>>> >>>>> Hi all, >>>>> >>>>> Currently using NiFi 1.11.4, we face a blocking issue trying to switch >>>>> to NiFi 1.13.1+ due to the VolatileContentRepository : some processors we >>>>> use (and probably others that we didn't try) were not able to process >>>>> flowfiles, such as MargeRecord, QueryRecord or SplitJson (logs are in the >>>>> Jira >>>>> ticket NiFi-8760 <https://issues.apache.org/jira/browse/NIFI-8760>). >>>>> >>>>> I wanted to know if any of you guys are able to reproduce the issue, >>>>> and if this is not a misconfiguration from our side. The nifi.properties >>>>> and flow.xml.gz used are available in the ticket. If I am not missing >>>>> anything, we could identify that the issue could come from this commit >>>>> <https://github.com/apache/nifi/commit/528fce2407d092d4ced1a58fcc14d0bc6e660b89> >>>>> since >>>>> it appeared with the 1.13.1 and the flow is working fine with 1.13.0. >>>>> >>>>> Open to contribute as much as I can if you confirm that this is not >>>>> due to a misconfiguration.. >>>>> >>>>> Thanks ! >>>>> Matthieu >>>>> >>>> >>> >>> -- >>> >>> Matthieu RÉ >>> Data Scientist - Machine Learning Engineer - Dassault Systèmes >>> >>> ENSIIE, M2 AIC (Université Paris-Saclay) >>> >>> Tel: 0631609755 >>> >>> Email: [email protected] >>> >>> >>>
