Really, what I'd like to do is this type of msql bread 'n butter task:

LOAD DATA INFILE <my_csv_file_ingested_as_a_flowfile>
INTO TABLE <my_destination_table>
FIELDS TERMINATED BY ','
ENCLOSED BY '"'
LINES TERMINATED BY '\n'
IGNORE 3 ROWS;

Russell


On Mon, Oct 5, 2015 at 2:09 PM, Russell Whitaker
<[email protected]> wrote:
> Bryan,
>
> Some of the CSV files are as small as 6 columns and a thousand lines
> or so of entries;
> some are many more columns and thousands of lines. I'm hoping to avoid
> the necessity
> of spawning a flowfile per line; I'm hoping there's the Nifi
> equivalent of the SQL DML
> statement LOAD DATA INFILE. (Relatedly, being able to toggle off
> foreign key & uniqueness
> checks and transaction isolation guarantees during bulk load would be
> very nice...)
>
> Russell
>
> On Mon, Oct 5, 2015 at 1:53 PM, Bryan Bende <[email protected]> wrote:
>> Russell,
>>
>> How big are these CSVs in terms of rows and columns?
>>
>> If they aren't too big, another option could be to use SplitText +
>> ReplaceText to split the csv into a FlowFile per line, and then convert
>> each line into SQL in ReplaceText. The downside is that this would create a
>> lot of FlowFiles for very large CSVs.
>>
>> -Bryan
>>
>> On Mon, Oct 5, 2015 at 4:14 PM, Russell Whitaker <[email protected]
>>> wrote:
>>
>>> Use case I'm attempting:
>>>
>>> 1.) ingest a CSV file with header lines;
>>> 2.) remove header lines (i.e. remove N lines at head);
>>> 2.) SQL INSERT each remaining line as a row in an existing mysql table.
>>>
>>> My thinking so far:
>>>
>>> #1 is given (CSV fetched already);
>>> #2 simple, should be handled in the context of ExecuteStreamProcessor;
>>>
>>> #3 is where I'm scratching my head: I keep re-reading the Description
>>> field for
>>> the PutSQL processor in http://nifi.apache.org/docs.html but can't seem to
>>> parse this into what I need to do to prepare a flowfile comprising lines of
>>> comma-separated lines of text into a series of INSERT statements:
>>>
>>> "Executes a SQL UPDATE or INSERT command. The content of an incoming
>>> FlowFile is expected to be the SQL command to execute. The SQL command
>>> may use the ? to escape parameters. In this case, the parameters to
>>> use must exist as FlowFile attributes with the naming convention
>>> sql.args.N.type and sql.args.N.value, where N is a positive integer.
>>> The sql.args.N.type is expected to be a number indicating the JDBC
>>> Type."
>>>
>>> Of related interest: there seems to be only one CSV-relevant processor
>>> type in
>>> v0.3.0, ConvertCSVToAvro; I fear the need to have to do something like
>>> this:
>>>
>>> ConvertCSVToAvro->ConvertAvroToJSON->ConvertJSONToSQL->PutSQL
>>>
>>> Guidance, suggestions? Thanks!
>>>
>>> Russell
>>>
>>> --
>>> Russell Whitaker
>>> http://twitter.com/OrthoNormalRuss
>>> http://www.linkedin.com/pub/russell-whitaker/0/b86/329
>>>
>
>
>
> --
> Russell Whitaker
> http://twitter.com/OrthoNormalRuss
> http://www.linkedin.com/pub/russell-whitaker/0/b86/329



-- 
Russell Whitaker
http://twitter.com/OrthoNormalRuss
http://www.linkedin.com/pub/russell-whitaker/0/b86/329

Reply via email to