[
https://issues.apache.org/jira/browse/NIFI-12027?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17762517#comment-17762517
]
Joe Witt commented on NIFI-12027:
---------------------------------
Looks like the Caffeine caching used could be easily augmented to do a simple
time based expireAfter or refreshAfter logic to ensure that within 'x time
frame' we always update schemas giving a best of both worlds mode where we get
good cache hit rates but also dont go too long before we get updates. Or we
can use eviction/removal based on incoming schema changes potentially but that
seems less reliable.
> PutDatabaseRecord should invalidate schema cache entries
> --------------------------------------------------------
>
> Key: NIFI-12027
> URL: https://issues.apache.org/jira/browse/NIFI-12027
> Project: Apache NiFi
> Issue Type: Improvement
> Reporter: Joe Witt
> Priority: Major
>
> On nifi main/2.0 line but confirmed also by a user on 1.15.x line....
> If you have a flow of records, such as CSV records, feeding into a
> PutDatabaseRecord and you add columns to the source data things flow normally
> if you have 'ignore new columns/fields' properties set as those new column
> values are just ignored. However, when you add the new columns present in
> the data to also be in the database you're writing to the values are not sent
> and end up as nulls (if the database allows them). If you stop/start the
> PutDatabaseRecord processor though then the values start getting set.
> This is with the default table schema cache value of 100. If you set that
> value to 0 then it appears to work fine without restarting. This suggests
> that our caching default is likely too simplistic. We should have some
> mechanism whereby schema changes in the database are detected and invalidate
> any schemas we have cached. Or we do it if we detect a difference in the
> incoming schema of the data. But the current behavior at least for defaults
> leaves the user, I think, having to choose from the default which likely is
> 'faster' or a slower but more dynamic path.
> Perhaps the other question here is how valuable is that schema cache in the
> first place?
--
This message was sent by Atlassian Jira
(v8.20.10#820010)