[ 
https://issues.apache.org/jira/browse/NIFI-12027?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17762517#comment-17762517
 ] 

Joe Witt commented on NIFI-12027:
---------------------------------

Looks like the Caffeine caching used could be easily augmented to do a simple 
time based expireAfter or refreshAfter logic to ensure that within 'x time 
frame' we always update schemas giving a best of both worlds mode where we get 
good cache hit rates but also dont go too long before we get updates.  Or we 
can use eviction/removal based on incoming schema changes potentially but that 
seems less reliable.

> PutDatabaseRecord should invalidate schema cache entries
> --------------------------------------------------------
>
>                 Key: NIFI-12027
>                 URL: https://issues.apache.org/jira/browse/NIFI-12027
>             Project: Apache NiFi
>          Issue Type: Improvement
>            Reporter: Joe Witt
>            Priority: Major
>
> On nifi main/2.0 line but confirmed also by a user on 1.15.x line....
> If you have a flow of records, such as CSV records, feeding into a 
> PutDatabaseRecord and you add columns to the source data things flow normally 
> if you have 'ignore new columns/fields' properties set as those new column 
> values are just ignored.  However, when you add the new columns present in 
> the data to also be in the database you're writing to the values are not sent 
> and end up as nulls (if the database allows them).  If you stop/start the 
> PutDatabaseRecord processor though then the values start getting set.
> This is with the default table schema cache value of 100.  If you set that 
> value to 0 then it appears to work fine without restarting.  This suggests 
> that our caching default is likely too simplistic.  We should have some 
> mechanism whereby schema changes in the database are detected and invalidate 
> any schemas we have cached.  Or we do it if we detect a difference in the 
> incoming schema of the data.  But the current behavior at least for defaults 
> leaves the user, I think, having to choose from the default which likely is 
> 'faster' or a slower but more dynamic path.
> Perhaps the other question here is how valuable is that schema cache in the 
> first place?



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

Reply via email to