[ 
https://issues.apache.org/jira/browse/NIFI-2881?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15811850#comment-15811850
 ] 

Matt Burgess commented on NIFI-2881:
------------------------------------

IMHO there will be too many issues involved (state management, behavior 
with/without incoming flow files) when specifying max-value columns in the 
database fetch processors. Instead, I propose we use this Jira to allow 
incoming connections to GenerateTableFetch only, but add Expression Language 
(EL) support to both processors (see explanation below). The behavior could be 
as follows:

1) If there are no incoming connection(s), GenerateTableFetch will continue to 
work as-is. This allows for backwards compatibility and supports max-value 
columns as it always has.
2) If there are incoming connection(s) but no flow file(s) available, 
GenerateTableFetch will not perform any processing.
3) If there are incoming connection(s) and flow file(s) available, 
GenerateTableFetch will perform its normal processing, using the flow file and 
Expression Language evaluation while generating the query.

The reason for allowing Expression Language for both QueryDatabaseTable and 
GenerateTableFetch is due to the addition of support for the NiFi Variable 
Registry (https://cwiki.apache.org/confluence/display/NIFI/Variable+Registry). 
This allows EL to be used with statically-provided values (vs values coming 
from flow file attributes), and can aid in support of the development lifecycle 
for NiFi flows (different set of variables for test vs production, e.g.).

The reason for not supporting incoming flow files for QueryDatabaseTable (while 
precluding the use of max-value columns) is that in this case its functionality 
becomes the same as ExecuteSQL with a SQL query specified in the ExecuteSQL 
properties. Having said that, QDT does have a couple of features that have not 
made it into ExecuteSQL yet, so if it is prudent to add the above behavior to 
QDT as well, then I'm ok with that. I was just apprehensive of touching the QDT 
code if at best (in theory) it results in equivalence with another processor.

With this added behavior, users will be able to use ListDatabaseTable with 
GenerateTableFetch in order to produce SQL statements for an arbitrary number 
of tables that are partitioned such that parallel fetches of appropriate size 
can be performed downstream, and the addition of EL support to both offers more 
flexibility as described above. Also it adds no complexity in terms of state 
management, as GenerateTableFetch would be invalid if the user has specified 
incoming connection(s) and max-value columns. This adds (not changes) behavior, 
so I feel with the appropriate documentation it would not be confusing to users.


> Allow Database Fetch processors to accept incoming flow files and use 
> Expression Language
> -----------------------------------------------------------------------------------------
>
>                 Key: NIFI-2881
>                 URL: https://issues.apache.org/jira/browse/NIFI-2881
>             Project: Apache NiFi
>          Issue Type: Improvement
>          Components: Extensions
>            Reporter: Matt Burgess
>
> The QueryDatabaseTable and GenerateTableFetch processors do not allow 
> Expression Language to be used in the properties, mainly because they also do 
> not allow incoming connections. This means if the user desires to fetch from 
> multiple tables, they currently need one instance of the processor for each 
> table, and those table names must be hard-coded.
> To support the same capabilities for multiple tables and more flexible 
> configuration via Expression Language, these processors should have 
> properties that accept Expression Language, and should accept (optional) 
> incoming connections.
> Conversation about the behavior of the processors is welcomed and encouraged. 
> For example, if an incoming flow file is available, do we also still run the 
> incremental fetch logic for tables that aren't specified by this flow file, 
> or do we just do incremental fetching when the processor is scheduled but 
> there is no incoming flow file. The latter implies a denial-of-service could 
> take place, by flooding the processor with flow files and not letting it do 
> its original job of querying the table, keeping track of maximum values, etc.
> This is likely a breaking change to the processors because of how state 
> management is implemented. Currently since the table name is hard coded, only 
> the column name comprises the key in the state. This would have to be 
> extended to have a compound key that represents table name, max-value column 
> name, etc.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to