[ 
https://issues.apache.org/jira/browse/NIFI-7791?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17190885#comment-17190885
 ] 

Ricky Saltzer commented on NIFI-7791:
-------------------------------------

Hey Joe -

Yeah I totally understand where you're coming from, and I believe it's already 
possible to use the JDBC library to achieve just that when writing to 
ClickHouse. The point of not using the recordreader/writer is to allow complete 
offloading of the processing to the database, which is one of ClickHouse's 
capabilities[1]. This is beneficial where you have a set of really 
large/capable ClickHouse machines and extremely large (10s of GBs) files you 
wish to write.

There may not be enough demand for this processor, and I can see how adding a 
custom processor something already possible might pollute the already massive 
amount of processors. I found a lot of success using it internally since it 
resulted in really fast turnaround for dumping data that I didn't want to 
bother applying a schema to within NiFi. 

[1] [https://clickhouse.tech/docs/en/interfaces/formats/]

> Add PutClickHouse Processor for Writing Large Streams
> -----------------------------------------------------
>
>                 Key: NIFI-7791
>                 URL: https://issues.apache.org/jira/browse/NIFI-7791
>             Project: Apache NiFi
>          Issue Type: New Feature
>            Reporter: Ricky Saltzer
>            Assignee: Ricky Saltzer
>            Priority: Minor
>
> ClickHouse supports streaming a number of file formats directly using their 
> JDBC (superset) library. Often times it's much more convenient to stream the 
> contents of a file directly to ClickHouse, rather than bothering to process 
> the data in NiFi and then using the native JDBC processor.
> One workaround is to just use PutHTTP to stream the file directly to 
> ClickHouse using it's HTTP endpoint. However, this can get a bit tedious, 
> especially if you need to pass credentials as part of the HTTP method call.
> I'm creating this Jira to support creating a simple PutClickHouse processor 
> that can stream a FlowFile directly to ClickHouse with the following features
>  * CSV, CSVWithNames, TSV and JSONEachRow
>  * Ability to modify column name ordering
>  * Custom delimiters for CSV and TSV
>  * SSL support (with and without strict mode)
>  * Multiple hosts (comma separated) to utilize the 
> {{BalancedClickhouseDataSource}}
>  * Username and Password
> I'm currently wrapping up a PR for this. I wrote it using Kotlin, which uses 
> a processor-scope maven plugin. If there's enough objection, it can be 
> rewritten in native Java.
> +[~joewitt] since I spoke with him regarding this a while back.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

Reply via email to