[GitHub] [nifi] bejancsaba commented on a diff in pull request #6344: NIFI-10403 Add processor supporting the new BigQuery Write API

GitBox Sat, 17 Sep 2022 04:45:10 -0700


bejancsaba commented on code in PR #6344:
URL: https://github.com/apache/nifi/pull/6344#discussion_r973576458



##########
nifi-nar-bundles/nifi-gcp-bundle/nifi-gcp-processors/src/main/resources/docs/org.apache.nifi.processors.gcp.bigquery.PutBigQuery/additionalDetails.html:
##########
@@ -0,0 +1,58 @@
+<!DOCTYPE html>
+<html lang="en" xmlns="http://www.w3.org/1999/html";>
+<!--
+      Licensed to the Apache Software Foundation (ASF) under one or more
+      contributor license agreements.  See the NOTICE file distributed with
+      this work for additional information regarding copyright ownership.
+      The ASF licenses this file to You under the Apache License, Version 2.0
+      (the "License"); you may not use this file except in compliance with
+      the License.  You may obtain a copy of the License at
+          http://www.apache.org/licenses/LICENSE-2.0
+      Unless required by applicable law or agreed to in writing, software
+      distributed under the License is distributed on an "AS IS" BASIS,
+      WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+      See the License for the specific language governing permissions and
+      limitations under the License.
+    -->
+
+<head>
+    <meta charset="utf-8"/>
+    <title>PutBigQuery</title>
+    <link rel="stylesheet" href="../../../../../css/component-usage.css" 
type="text/css"/>
+</head>
+<body>
+
+<h1>Streaming Versus Batching Data</h1>
+
+<p>
+    PutBigQuery is record based and is relying on the gRPC based Write API 
using protocol buffers. The underlying stream supports both streaming and 
batching approaches.
+</p>
+
+<h3>Streaming</h3>
+<p>
+    With streaming the appended data to the stream is instantly available in 
BigQuery for reading. It is configurable how many records (rows) should be 
appended at once.
+    Only one stream is established per flow file so at the conclusion of the 
FlowFile processing the used stream is closed and a new one is opened for the 
next FlowFile.
+    Supports exactly once delivery semantics via stream offsets.
+</p>
+
+<h3>Batching</h3>
+<p>
+    Similarly to the streaming approach one stream is opened for each FlowFile 
and records are appended to the stream. However data is not available in 
BigQuery until it is
+    committed by the processor at the end of the FlowFile processing.
+</p>
+
+<h1>Improvement opportunities</h1>
+<p>
+    <ul>
+        <li>The table has to exist on BigQuery side it is not created 
automatically</li>
+        <li>The Write API supports multiple streams for parallel execution and 
transactionality across streams. This is not utilized at the moment as this 
would be covered on NiFI framework level.</li>
+    </ul>
+</p>
+
+<p>
+    You can find additional details on the official <a 
href="https://cloud.google.com/bigquery/docs/write-api";>Write API site</a>

Review Comment:
   Thanks, applied



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

[GitHub] [nifi] bejancsaba commented on a diff in pull request #6344: NIFI-10403 Add processor supporting the new BigQuery Write API

Reply via email to