[jira] [Commented] (NIFI-2157) Add GenerateTableFetch processor

ASF GitHub Bot (JIRA) Sat, 16 Jul 2016 18:42:52 -0700

    [ 
https://issues.apache.org/jira/browse/NIFI-2157?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15381030#comment-15381030
 ]


ASF GitHub Bot commented on NIFI-2157:
--------------------------------------

Github user JPercivall commented on a diff in the pull request:

    https://github.com/apache/nifi/pull/645#discussion_r71074419
  
    --- Diff: 
nifi-nar-bundles/nifi-standard-bundle/nifi-standard-processors/src/main/java/org/apache/nifi/processors/standard/AbstractDatabaseFetchProcessor.java
 ---
    @@ -0,0 +1,361 @@
    +/*
    + * Licensed to the Apache Software Foundation (ASF) under one or more
    + * contributor license agreements.  See the NOTICE file distributed with
    + * this work for additional information regarding copyright ownership.
    + * The ASF licenses this file to You under the Apache License, Version 2.0
    + * (the "License"); you may not use this file except in compliance with
    + * the License.  You may obtain a copy of the License at
    + *
    + *     http://www.apache.org/licenses/LICENSE-2.0
    + *
    + * Unless required by applicable law or agreed to in writing, software
    + * distributed under the License is distributed on an "AS IS" BASIS,
    + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
    + * See the License for the specific language governing permissions and
    + * limitations under the License.
    + */
    +package org.apache.nifi.processors.standard;
    +
    +import org.apache.nifi.components.PropertyDescriptor;
    +import org.apache.nifi.dbcp.DBCPService;
    +import org.apache.nifi.processor.AbstractSessionFactoryProcessor;
    +import org.apache.nifi.processor.ProcessContext;
    +import org.apache.nifi.processor.Relationship;
    +import org.apache.nifi.processor.exception.ProcessException;
    +import org.apache.nifi.processor.util.StandardValidators;
    +import org.apache.nifi.processors.standard.db.DatabaseAdapter;
    +
    +import java.io.IOException;
    +import java.math.BigDecimal;
    +import java.sql.Connection;
    +import java.sql.ResultSet;
    +import java.sql.ResultSetMetaData;
    +import java.sql.SQLException;
    +import java.sql.Statement;
    +import java.sql.Timestamp;
    +import java.text.DecimalFormat;
    +import java.text.ParseException;
    +import java.util.Date;
    +import java.util.HashMap;
    +import java.util.List;
    +import java.util.Map;
    +import java.util.ServiceLoader;
    +import java.util.Set;
    +
    +import static java.sql.Types.ARRAY;
    +import static java.sql.Types.BIGINT;
    +import static java.sql.Types.BINARY;
    +import static java.sql.Types.BIT;
    +import static java.sql.Types.BLOB;
    +import static java.sql.Types.BOOLEAN;
    +import static java.sql.Types.CLOB;
    +import static java.sql.Types.DECIMAL;
    +import static java.sql.Types.DOUBLE;
    +import static java.sql.Types.FLOAT;
    +import static java.sql.Types.INTEGER;
    +import static java.sql.Types.LONGVARBINARY;
    +import static java.sql.Types.NUMERIC;
    +import static java.sql.Types.CHAR;
    +import static java.sql.Types.DATE;
    +import static java.sql.Types.LONGNVARCHAR;
    +import static java.sql.Types.LONGVARCHAR;
    +import static java.sql.Types.NCHAR;
    +import static java.sql.Types.NVARCHAR;
    +import static java.sql.Types.REAL;
    +import static java.sql.Types.ROWID;
    +import static java.sql.Types.SMALLINT;
    +import static java.sql.Types.TIME;
    +import static java.sql.Types.TIMESTAMP;
    +import static java.sql.Types.TINYINT;
    +import static java.sql.Types.VARBINARY;
    +import static java.sql.Types.VARCHAR;
    +
    +/**
    + * A base class for common code shared by processors that fetch RDBMS data.
    + */
    +public abstract class AbstractDatabaseFetchProcessor extends 
AbstractSessionFactoryProcessor {
    +
    +    // Relationships
    +    public static final Relationship REL_SUCCESS = new 
Relationship.Builder()
    +            .name("success")
    +            .description("Successfully created FlowFile from SQL query 
result set.")
    +            .build();
    +
    +    protected Set<Relationship> relationships;
    +
    +    // Properties
    +    public static final PropertyDescriptor DBCP_SERVICE = new 
PropertyDescriptor.Builder()
    +            .name("Database Connection Pooling Service")
    +            .description("The Controller Service that is used to obtain a 
connection to the database.")
    +            .required(true)
    +            .identifiesControllerService(DBCPService.class)
    +            .build();
    +
    +    public static final PropertyDescriptor TABLE_NAME = new 
PropertyDescriptor.Builder()
    +            .name("Table Name")
    +            .description("The name of the database table to be queried.")
    +            .required(true)
    +            .addValidator(StandardValidators.NON_EMPTY_VALIDATOR)
    +            .build();
    +
    +    public static final PropertyDescriptor COLUMN_NAMES = new 
PropertyDescriptor.Builder()
    +            .name("Columns to Return")
    +            .description("A comma-separated list of column names to be 
used in the query. If your database requires "
    +                    + "special treatment of the names (quoting, e.g.), 
each name should include such treatment. If no "
    +                    + "column names are supplied, all columns in the 
specified table will be returned.")
    +            .required(false)
    +            .addValidator(StandardValidators.NON_EMPTY_VALIDATOR)
    +            .build();
    +
    +    public static final PropertyDescriptor MAX_VALUE_COLUMN_NAMES = new 
PropertyDescriptor.Builder()
    +            .name("Maximum-value Columns")
    +            .description("A comma-separated list of column names. The 
processor will keep track of the maximum value "
    +                    + "for each column that has been returned since the 
processor started running. This can be used to "
    +                    + "retrieve only those rows that have been 
added/updated since the last retrieval. Note that some "
    +                    + "JDBC types such as bit/boolean are not conducive to 
maintaining maximum value, so columns of these "
    +                    + "types should not be listed in this property, and 
will result in error(s) during processing.")
    +            .required(false)
    +            .addValidator(StandardValidators.NON_EMPTY_VALIDATOR)
    +            .build();
    +
    +    public static final PropertyDescriptor QUERY_TIMEOUT = new 
PropertyDescriptor.Builder()
    +            .name("Max Wait Time")
    +            .description("The maximum amount of time allowed for a running 
SQL select query "
    +                    + ", zero means there is no limit. Max time less than 
1 second will be equal to zero.")
    +            .defaultValue("0 seconds")
    +            .required(true)
    +            .addValidator(StandardValidators.TIME_PERIOD_VALIDATOR)
    +            .build();
    +
    +    protected List<PropertyDescriptor> propDescriptors;
    +
    +    public static final PropertyDescriptor DB_TYPE;
    +
    +    protected final static Map<String, DatabaseAdapter> dbAdapters = new 
HashMap<>();
    +    protected final Map<String, Integer> columnTypeMap = new HashMap<>();
    +
    +    static {
    +        // Load the DatabaseAdapters
    +        ServiceLoader<DatabaseAdapter> dbAdapterLoader = 
ServiceLoader.load(DatabaseAdapter.class);
    +        dbAdapterLoader.forEach(it -> dbAdapters.put(it.getName(), it));
    +
    +        DB_TYPE = new PropertyDescriptor.Builder()
    +                .name("db-fetch-db-type")
    +                .displayName("Database Type")
    +                .description("The type/flavor of database, used for 
generating database-specific code. In many cases the Generic type "
    +                        + "should suffice, but some databases (such as 
Oracle) require custom SQL clauses. ")
    +                .allowableValues(dbAdapters.keySet())
    +                
.defaultValue(dbAdapters.values().stream().findFirst().get().getName())
    +                .required(true)
    +                .build();
    +    }
    +
    +    public void setup(final ProcessContext context) {
    +        // Try to fill the columnTypeMap with the types of the desired 
max-value columns
    +        final DBCPService dbcpService = 
context.getProperty(DBCP_SERVICE).asControllerService(DBCPService.class);
    +        final String tableName = 
context.getProperty(TABLE_NAME).getValue();
    +        final String maxValueColumnNames = 
context.getProperty(MAX_VALUE_COLUMN_NAMES).getValue();
    +        final DatabaseAdapter dbAdapter = 
dbAdapters.get(context.getProperty(DB_TYPE).getValue());
    +
    +        try (final Connection con = dbcpService.getConnection();
    +             final Statement st = con.createStatement()) {
    +
    +            // Try a query that returns no rows, for the purposes of 
getting metadata about the columns. It is possible
    +            // to use DatabaseMetaData.getColumns(), but not all drivers 
support this, notably the schema-on-read
    +            // approach as in Apache Drill
    +            String query = dbAdapter.getSelectStatement(tableName, 
maxValueColumnNames, "1 = 0", null, null, null);
    +            ResultSet resultSet = st.executeQuery(query);
    +            ResultSetMetaData resultSetMetaData = resultSet.getMetaData();
    +            int numCols = resultSetMetaData.getColumnCount();
    +            if (numCols > 0) {
    +                columnTypeMap.clear();
    +                for (int i = 1; i <= numCols; i++) {
    +                    String colName = 
resultSetMetaData.getColumnName(i).toLowerCase();
    +                    int colType = resultSetMetaData.getColumnType(i);
    +                    columnTypeMap.put(colName, colType);
    +                }
    +
    +            } else {
    +                throw new ProcessException("No columns found in table from 
those specified: " + maxValueColumnNames);
    +            }
    +
    +        } catch (SQLException e) {
    +            throw new ProcessException("Unable to communicate with 
database in order to determine column types", e);
    +        }
    +    }
    +
    +    protected static String getMaxValueFromRow(ResultSet resultSet,
    +                                            int columnIndex,
    +                                            Integer type,
    +                                            String maxValueString,
    +                                            String preProcessStrategy)
    +            throws ParseException, IOException, SQLException {
    +
    +        // Skip any columns we're not keeping track of or whose value is 
null
    +        if (type == null || resultSet.getObject(columnIndex) == null) {
    +            return null;
    +        }
    +
    +        switch (type) {
    +            case CHAR:
    +            case LONGNVARCHAR:
    +            case LONGVARCHAR:
    +            case NCHAR:
    +            case NVARCHAR:
    +            case VARCHAR:
    +            case ROWID:
    +                String colStringValue = resultSet.getString(columnIndex);
    +                if (maxValueString == null || 
colStringValue.compareTo(maxValueString) > 0) {
    +                    return colStringValue;
    +                }
    +                break;
    +
    +            case INTEGER:
    +            case SMALLINT:
    +            case TINYINT:
    +                Integer colIntValue = resultSet.getInt(columnIndex);
    +                Integer maxIntValue = null;
    +                if (maxValueString != null) {
    +                    maxIntValue = Integer.valueOf(maxValueString);
    +                }
    +                if (maxIntValue == null || colIntValue > maxIntValue) {
    +                    return colIntValue.toString();
    +                }
    +                break;
    +
    +            case BIGINT:
    +                Long colLongValue = resultSet.getLong(columnIndex);
    +                Long maxLongValue = null;
    +                if (maxValueString != null) {
    +                    maxLongValue = Long.valueOf(maxValueString);
    +                }
    +                if (maxLongValue == null || colLongValue > maxLongValue) {
    +                    return colLongValue.toString();
    +                }
    +                break;
    +
    +            case FLOAT:
    +            case REAL:
    +            case DOUBLE:
    +                Double colDoubleValue = resultSet.getDouble(columnIndex);
    +                Double maxDoubleValue = null;
    +                if (maxValueString != null) {
    +                    maxDoubleValue = Double.valueOf(maxValueString);
    +                }
    +                if (maxDoubleValue == null || colDoubleValue > 
maxDoubleValue) {
    +                    return colDoubleValue.toString();
    +                }
    +                break;
    +
    +            case DECIMAL:
    +            case NUMERIC:
    +                BigDecimal colBigDecimalValue = 
resultSet.getBigDecimal(columnIndex);
    +                BigDecimal maxBigDecimalValue = null;
    +                if (maxValueString != null) {
    +                    DecimalFormat df = new DecimalFormat();
    +                    df.setParseBigDecimal(true);
    +                    maxBigDecimalValue = (BigDecimal) 
df.parse(maxValueString);
    +                }
    +                if (maxBigDecimalValue == null || 
colBigDecimalValue.compareTo(maxBigDecimalValue) > 0) {
    +                    return colBigDecimalValue.toString();
    +                }
    +                break;
    +
    +            case DATE:
    +                Date rawColDateValue = resultSet.getDate(columnIndex);
    +                java.sql.Date colDateValue = new 
java.sql.Date(rawColDateValue.getTime());
    +                java.sql.Date maxDateValue = null;
    +                if (maxValueString != null) {
    +                    maxDateValue = java.sql.Date.valueOf(maxValueString);
    +                }
    +                if (maxDateValue == null || 
colDateValue.after(maxDateValue)) {
    +                    return colDateValue.toString();
    +                }
    +                break;
    +
    +            case TIME:
    +                Date rawColTimeValue = resultSet.getDate(columnIndex);
    +                java.sql.Time colTimeValue = new 
java.sql.Time(rawColTimeValue.getTime());
    +                java.sql.Time maxTimeValue = null;
    +                if (maxValueString != null) {
    +                    maxTimeValue = java.sql.Time.valueOf(maxValueString);
    +                }
    +                if (maxTimeValue == null || 
colTimeValue.after(maxTimeValue)) {
    +                    return colTimeValue.toString();
    +                }
    +                break;
    +
    +            case TIMESTAMP:
    +                // Oracle timestamp queries must use literals in 
java.sql.Date format
    +                if ("Oracle".equals(preProcessStrategy)) {
    +                    Date rawColOracleTimestampValue = 
resultSet.getDate(columnIndex);
    +                    java.sql.Date oracleTimestampValue = new 
java.sql.Date(rawColOracleTimestampValue.getTime());
    +                    java.sql.Date maxOracleTimestampValue = null;
    +                    if (maxValueString != null) {
    +                        maxOracleTimestampValue = 
java.sql.Date.valueOf(maxValueString);
    +                    }
    +                    if (maxOracleTimestampValue == null || 
oracleTimestampValue.after(maxOracleTimestampValue)) {
    +                        return oracleTimestampValue.toString();
    +                    }
    +                } else {
    +                    Timestamp rawColTimestampValue = 
resultSet.getTimestamp(columnIndex);
    +                    java.sql.Timestamp colTimestampValue = new 
java.sql.Timestamp(rawColTimestampValue.getTime());
    +                    java.sql.Timestamp maxTimestampValue = null;
    +                    if (maxValueString != null) {
    +                        maxTimestampValue = 
java.sql.Timestamp.valueOf(maxValueString);
    +                    }
    +                    if (maxTimestampValue == null || 
colTimestampValue.after(maxTimestampValue)) {
    +                        return colTimestampValue.toString();
    +                    }
    +                }
    +                break;
    +
    +            case BIT:
    +            case BOOLEAN:
    +            case BINARY:
    +            case VARBINARY:
    +            case LONGVARBINARY:
    +            case ARRAY:
    +            case BLOB:
    +            case CLOB:
    +            default:
    +                throw new IOException("Type for column " + columnIndex + " 
is not valid for maintaining maximum value");
    +        }
    +        return null;
    +    }
    +
    +    /**
    +     * Returns a SQL literal for the given value based on its type. For 
example, values of character type need to be enclosed
    +     * in single quotes, whereas values of numeric type should not be.
    +     *
    +     * @param type  The JDBC type for the desired literal
    +     * @param value The value to be converted to a SQL literal
    +     * @return A String representing the given value as a literal of the 
given type
    +     */
    +    protected static String getLiteralByType(int type, String value, 
String preProcessStrategy) {
    --- End diff --
    
    The last parameter shouldn't be "preProcessStrategy" since that property 
was removed in favor of database type.


> Add GenerateTableFetch processor
> --------------------------------
>
>                 Key: NIFI-2157
>                 URL: https://issues.apache.org/jira/browse/NIFI-2157
>             Project: Apache NiFi
>          Issue Type: Sub-task
>            Reporter: Matt Burgess
>            Assignee: Matt Burgess
>             Fix For: 1.0.0
>
>
> This processor would presumably operate like QueryDatabaseTable, except it 
> will contain a "Partition Size" property, and rather than executing the SQL 
> statement(s) to fetch rows, it would generate flow files containing SQL 
> statements that will select rows from a table. If the partition size is 
> indicated, then the SELECT statements will refer to a range of rows, such 
> that each statement will grab only a portion of the table. If max-value 
> columns are specified, then only rows whose observed values for those columns 
> exceed the current maximum will be fetched (i.e. like QueryDatabaseTable). 
> These flow files (due to NIFI-1973) can be passed to ExecuteSQL processors 
> for the actual fetching of rows, and ExecuteSQL can be distributed across 
> cluster nodes and/or multiple tasks. These features enable distributed 
> incremental fetching of rows from database table(s).



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (NIFI-2157) Add GenerateTableFetch processor

Reply via email to