[
https://issues.apache.org/jira/browse/NIFI-2881?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15811850#comment-15811850
]
Matt Burgess commented on NIFI-2881:
------------------------------------
IMHO there will be too many issues involved (state management, behavior
with/without incoming flow files) when specifying max-value columns in the
database fetch processors. Instead, I propose we use this Jira to allow
incoming connections to GenerateTableFetch only, but add Expression Language
(EL) support to both processors (see explanation below). The behavior could be
as follows:
1) If there are no incoming connection(s), GenerateTableFetch will continue to
work as-is. This allows for backwards compatibility and supports max-value
columns as it always has.
2) If there are incoming connection(s) but no flow file(s) available,
GenerateTableFetch will not perform any processing.
3) If there are incoming connection(s) and flow file(s) available,
GenerateTableFetch will perform its normal processing, using the flow file and
Expression Language evaluation while generating the query.
The reason for allowing Expression Language for both QueryDatabaseTable and
GenerateTableFetch is due to the addition of support for the NiFi Variable
Registry (https://cwiki.apache.org/confluence/display/NIFI/Variable+Registry).
This allows EL to be used with statically-provided values (vs values coming
from flow file attributes), and can aid in support of the development lifecycle
for NiFi flows (different set of variables for test vs production, e.g.).
The reason for not supporting incoming flow files for QueryDatabaseTable (while
precluding the use of max-value columns) is that in this case its functionality
becomes the same as ExecuteSQL with a SQL query specified in the ExecuteSQL
properties. Having said that, QDT does have a couple of features that have not
made it into ExecuteSQL yet, so if it is prudent to add the above behavior to
QDT as well, then I'm ok with that. I was just apprehensive of touching the QDT
code if at best (in theory) it results in equivalence with another processor.
With this added behavior, users will be able to use ListDatabaseTable with
GenerateTableFetch in order to produce SQL statements for an arbitrary number
of tables that are partitioned such that parallel fetches of appropriate size
can be performed downstream, and the addition of EL support to both offers more
flexibility as described above. Also it adds no complexity in terms of state
management, as GenerateTableFetch would be invalid if the user has specified
incoming connection(s) and max-value columns. This adds (not changes) behavior,
so I feel with the appropriate documentation it would not be confusing to users.
> Allow Database Fetch processors to accept incoming flow files and use
> Expression Language
> -----------------------------------------------------------------------------------------
>
> Key: NIFI-2881
> URL: https://issues.apache.org/jira/browse/NIFI-2881
> Project: Apache NiFi
> Issue Type: Improvement
> Components: Extensions
> Reporter: Matt Burgess
>
> The QueryDatabaseTable and GenerateTableFetch processors do not allow
> Expression Language to be used in the properties, mainly because they also do
> not allow incoming connections. This means if the user desires to fetch from
> multiple tables, they currently need one instance of the processor for each
> table, and those table names must be hard-coded.
> To support the same capabilities for multiple tables and more flexible
> configuration via Expression Language, these processors should have
> properties that accept Expression Language, and should accept (optional)
> incoming connections.
> Conversation about the behavior of the processors is welcomed and encouraged.
> For example, if an incoming flow file is available, do we also still run the
> incremental fetch logic for tables that aren't specified by this flow file,
> or do we just do incremental fetching when the processor is scheduled but
> there is no incoming flow file. The latter implies a denial-of-service could
> take place, by flooding the processor with flow files and not letting it do
> its original job of querying the table, keeping track of maximum values, etc.
> This is likely a breaking change to the processors because of how state
> management is implemented. Currently since the table name is hard coded, only
> the column name comprises the key in the state. This would have to be
> extended to have a compound key that represents table name, max-value column
> name, etc.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)