Really, what I'd like to do is this type of msql bread 'n butter task: LOAD DATA INFILE <my_csv_file_ingested_as_a_flowfile> INTO TABLE <my_destination_table> FIELDS TERMINATED BY ',' ENCLOSED BY '"' LINES TERMINATED BY '\n' IGNORE 3 ROWS;
Russell On Mon, Oct 5, 2015 at 2:09 PM, Russell Whitaker <[email protected]> wrote: > Bryan, > > Some of the CSV files are as small as 6 columns and a thousand lines > or so of entries; > some are many more columns and thousands of lines. I'm hoping to avoid > the necessity > of spawning a flowfile per line; I'm hoping there's the Nifi > equivalent of the SQL DML > statement LOAD DATA INFILE. (Relatedly, being able to toggle off > foreign key & uniqueness > checks and transaction isolation guarantees during bulk load would be > very nice...) > > Russell > > On Mon, Oct 5, 2015 at 1:53 PM, Bryan Bende <[email protected]> wrote: >> Russell, >> >> How big are these CSVs in terms of rows and columns? >> >> If they aren't too big, another option could be to use SplitText + >> ReplaceText to split the csv into a FlowFile per line, and then convert >> each line into SQL in ReplaceText. The downside is that this would create a >> lot of FlowFiles for very large CSVs. >> >> -Bryan >> >> On Mon, Oct 5, 2015 at 4:14 PM, Russell Whitaker <[email protected] >>> wrote: >> >>> Use case I'm attempting: >>> >>> 1.) ingest a CSV file with header lines; >>> 2.) remove header lines (i.e. remove N lines at head); >>> 2.) SQL INSERT each remaining line as a row in an existing mysql table. >>> >>> My thinking so far: >>> >>> #1 is given (CSV fetched already); >>> #2 simple, should be handled in the context of ExecuteStreamProcessor; >>> >>> #3 is where I'm scratching my head: I keep re-reading the Description >>> field for >>> the PutSQL processor in http://nifi.apache.org/docs.html but can't seem to >>> parse this into what I need to do to prepare a flowfile comprising lines of >>> comma-separated lines of text into a series of INSERT statements: >>> >>> "Executes a SQL UPDATE or INSERT command. The content of an incoming >>> FlowFile is expected to be the SQL command to execute. The SQL command >>> may use the ? to escape parameters. In this case, the parameters to >>> use must exist as FlowFile attributes with the naming convention >>> sql.args.N.type and sql.args.N.value, where N is a positive integer. >>> The sql.args.N.type is expected to be a number indicating the JDBC >>> Type." >>> >>> Of related interest: there seems to be only one CSV-relevant processor >>> type in >>> v0.3.0, ConvertCSVToAvro; I fear the need to have to do something like >>> this: >>> >>> ConvertCSVToAvro->ConvertAvroToJSON->ConvertJSONToSQL->PutSQL >>> >>> Guidance, suggestions? Thanks! >>> >>> Russell >>> >>> -- >>> Russell Whitaker >>> http://twitter.com/OrthoNormalRuss >>> http://www.linkedin.com/pub/russell-whitaker/0/b86/329 >>> > > > > -- > Russell Whitaker > http://twitter.com/OrthoNormalRuss > http://www.linkedin.com/pub/russell-whitaker/0/b86/329 -- Russell Whitaker http://twitter.com/OrthoNormalRuss http://www.linkedin.com/pub/russell-whitaker/0/b86/329
