Bryan, Some of the CSV files are as small as 6 columns and a thousand lines or so of entries; some are many more columns and thousands of lines. I'm hoping to avoid the necessity of spawning a flowfile per line; I'm hoping there's the Nifi equivalent of the SQL DML statement LOAD DATA INFILE. (Relatedly, being able to toggle off foreign key & uniqueness checks and transaction isolation guarantees during bulk load would be very nice...)
Russell On Mon, Oct 5, 2015 at 1:53 PM, Bryan Bende <[email protected]> wrote: > Russell, > > How big are these CSVs in terms of rows and columns? > > If they aren't too big, another option could be to use SplitText + > ReplaceText to split the csv into a FlowFile per line, and then convert > each line into SQL in ReplaceText. The downside is that this would create a > lot of FlowFiles for very large CSVs. > > -Bryan > > On Mon, Oct 5, 2015 at 4:14 PM, Russell Whitaker <[email protected] >> wrote: > >> Use case I'm attempting: >> >> 1.) ingest a CSV file with header lines; >> 2.) remove header lines (i.e. remove N lines at head); >> 2.) SQL INSERT each remaining line as a row in an existing mysql table. >> >> My thinking so far: >> >> #1 is given (CSV fetched already); >> #2 simple, should be handled in the context of ExecuteStreamProcessor; >> >> #3 is where I'm scratching my head: I keep re-reading the Description >> field for >> the PutSQL processor in http://nifi.apache.org/docs.html but can't seem to >> parse this into what I need to do to prepare a flowfile comprising lines of >> comma-separated lines of text into a series of INSERT statements: >> >> "Executes a SQL UPDATE or INSERT command. The content of an incoming >> FlowFile is expected to be the SQL command to execute. The SQL command >> may use the ? to escape parameters. In this case, the parameters to >> use must exist as FlowFile attributes with the naming convention >> sql.args.N.type and sql.args.N.value, where N is a positive integer. >> The sql.args.N.type is expected to be a number indicating the JDBC >> Type." >> >> Of related interest: there seems to be only one CSV-relevant processor >> type in >> v0.3.0, ConvertCSVToAvro; I fear the need to have to do something like >> this: >> >> ConvertCSVToAvro->ConvertAvroToJSON->ConvertJSONToSQL->PutSQL >> >> Guidance, suggestions? Thanks! >> >> Russell >> >> -- >> Russell Whitaker >> http://twitter.com/OrthoNormalRuss >> http://www.linkedin.com/pub/russell-whitaker/0/b86/329 >> -- Russell Whitaker http://twitter.com/OrthoNormalRuss http://www.linkedin.com/pub/russell-whitaker/0/b86/329
