Hi Russell,

I understand what you are getting at... I don't think the current
processors we have are designed to handle this bulk load scenario.

The series of processors you outlined in your original email would likely
work, but I *think* it would suffer the same problem of producing a lot of
FlowFiles because ConvertJSONToSQL will produce a FlowFile with one SQL
statement for each entry of the JSON array. You would have to try this out
to see if you are getting the performance you want with your data set.
PutSQL does have a configurable batch size that lets multiple inserts
happen through a single database transaction.

I'm not that familiar with the "LOAD DATA INFILE" command, but I'm
wondering if another option is to somehow use ExecuteStreamCommand to
execute that statement based on the filename from an incoming FlowFile.
Just thinking out loud about what else could be done.

-Bryan



On Mon, Oct 5, 2015 at 5:24 PM, Russell Whitaker <[email protected]
> wrote:

> Really, what I'd like to do is this type of msql bread 'n butter task:
>
> LOAD DATA INFILE <my_csv_file_ingested_as_a_flowfile>
> INTO TABLE <my_destination_table>
> FIELDS TERMINATED BY ','
> ENCLOSED BY '"'
> LINES TERMINATED BY '\n'
> IGNORE 3 ROWS;
>
> Russell
>
>
> On Mon, Oct 5, 2015 at 2:09 PM, Russell Whitaker
> <[email protected]> wrote:
> > Bryan,
> >
> > Some of the CSV files are as small as 6 columns and a thousand lines
> > or so of entries;
> > some are many more columns and thousands of lines. I'm hoping to avoid
> > the necessity
> > of spawning a flowfile per line; I'm hoping there's the Nifi
> > equivalent of the SQL DML
> > statement LOAD DATA INFILE. (Relatedly, being able to toggle off
> > foreign key & uniqueness
> > checks and transaction isolation guarantees during bulk load would be
> > very nice...)
> >
> > Russell
> >
> > On Mon, Oct 5, 2015 at 1:53 PM, Bryan Bende <[email protected]> wrote:
> >> Russell,
> >>
> >> How big are these CSVs in terms of rows and columns?
> >>
> >> If they aren't too big, another option could be to use SplitText +
> >> ReplaceText to split the csv into a FlowFile per line, and then convert
> >> each line into SQL in ReplaceText. The downside is that this would
> create a
> >> lot of FlowFiles for very large CSVs.
> >>
> >> -Bryan
> >>
> >> On Mon, Oct 5, 2015 at 4:14 PM, Russell Whitaker <
> [email protected]
> >>> wrote:
> >>
> >>> Use case I'm attempting:
> >>>
> >>> 1.) ingest a CSV file with header lines;
> >>> 2.) remove header lines (i.e. remove N lines at head);
> >>> 2.) SQL INSERT each remaining line as a row in an existing mysql table.
> >>>
> >>> My thinking so far:
> >>>
> >>> #1 is given (CSV fetched already);
> >>> #2 simple, should be handled in the context of ExecuteStreamProcessor;
> >>>
> >>> #3 is where I'm scratching my head: I keep re-reading the Description
> >>> field for
> >>> the PutSQL processor in http://nifi.apache.org/docs.html but can't
> seem to
> >>> parse this into what I need to do to prepare a flowfile comprising
> lines of
> >>> comma-separated lines of text into a series of INSERT statements:
> >>>
> >>> "Executes a SQL UPDATE or INSERT command. The content of an incoming
> >>> FlowFile is expected to be the SQL command to execute. The SQL command
> >>> may use the ? to escape parameters. In this case, the parameters to
> >>> use must exist as FlowFile attributes with the naming convention
> >>> sql.args.N.type and sql.args.N.value, where N is a positive integer.
> >>> The sql.args.N.type is expected to be a number indicating the JDBC
> >>> Type."
> >>>
> >>> Of related interest: there seems to be only one CSV-relevant processor
> >>> type in
> >>> v0.3.0, ConvertCSVToAvro; I fear the need to have to do something like
> >>> this:
> >>>
> >>> ConvertCSVToAvro->ConvertAvroToJSON->ConvertJSONToSQL->PutSQL
> >>>
> >>> Guidance, suggestions? Thanks!
> >>>
> >>> Russell
> >>>
> >>> --
> >>> Russell Whitaker
> >>> http://twitter.com/OrthoNormalRuss
> >>> http://www.linkedin.com/pub/russell-whitaker/0/b86/329
> >>>
> >
> >
> >
> > --
> > Russell Whitaker
> > http://twitter.com/OrthoNormalRuss
> > http://www.linkedin.com/pub/russell-whitaker/0/b86/329
>
>
>
> --
> Russell Whitaker
> http://twitter.com/OrthoNormalRuss
> http://www.linkedin.com/pub/russell-whitaker/0/b86/329
>

Reply via email to