Bryan,

Some of the CSV files are as small as 6 columns and a thousand lines
or so of entries;
some are many more columns and thousands of lines. I'm hoping to avoid
the necessity
of spawning a flowfile per line; I'm hoping there's the Nifi
equivalent of the SQL DML
statement LOAD DATA INFILE. (Relatedly, being able to toggle off
foreign key & uniqueness
checks and transaction isolation guarantees during bulk load would be
very nice...)

Russell

On Mon, Oct 5, 2015 at 1:53 PM, Bryan Bende <[email protected]> wrote:
> Russell,
>
> How big are these CSVs in terms of rows and columns?
>
> If they aren't too big, another option could be to use SplitText +
> ReplaceText to split the csv into a FlowFile per line, and then convert
> each line into SQL in ReplaceText. The downside is that this would create a
> lot of FlowFiles for very large CSVs.
>
> -Bryan
>
> On Mon, Oct 5, 2015 at 4:14 PM, Russell Whitaker <[email protected]
>> wrote:
>
>> Use case I'm attempting:
>>
>> 1.) ingest a CSV file with header lines;
>> 2.) remove header lines (i.e. remove N lines at head);
>> 2.) SQL INSERT each remaining line as a row in an existing mysql table.
>>
>> My thinking so far:
>>
>> #1 is given (CSV fetched already);
>> #2 simple, should be handled in the context of ExecuteStreamProcessor;
>>
>> #3 is where I'm scratching my head: I keep re-reading the Description
>> field for
>> the PutSQL processor in http://nifi.apache.org/docs.html but can't seem to
>> parse this into what I need to do to prepare a flowfile comprising lines of
>> comma-separated lines of text into a series of INSERT statements:
>>
>> "Executes a SQL UPDATE or INSERT command. The content of an incoming
>> FlowFile is expected to be the SQL command to execute. The SQL command
>> may use the ? to escape parameters. In this case, the parameters to
>> use must exist as FlowFile attributes with the naming convention
>> sql.args.N.type and sql.args.N.value, where N is a positive integer.
>> The sql.args.N.type is expected to be a number indicating the JDBC
>> Type."
>>
>> Of related interest: there seems to be only one CSV-relevant processor
>> type in
>> v0.3.0, ConvertCSVToAvro; I fear the need to have to do something like
>> this:
>>
>> ConvertCSVToAvro->ConvertAvroToJSON->ConvertJSONToSQL->PutSQL
>>
>> Guidance, suggestions? Thanks!
>>
>> Russell
>>
>> --
>> Russell Whitaker
>> http://twitter.com/OrthoNormalRuss
>> http://www.linkedin.com/pub/russell-whitaker/0/b86/329
>>



-- 
Russell Whitaker
http://twitter.com/OrthoNormalRuss
http://www.linkedin.com/pub/russell-whitaker/0/b86/329

Reply via email to