This is an automated email from the ASF dual-hosted git repository. granthenke pushed a commit to branch master in repository https://gitbox.apache.org/repos/asf/kudu.git
commit e65b6d77a1afec28e3fe8cf592d61a1e4ea8656c Author: Grant Henke <[email protected]> AuthorDate: Sun Jul 14 18:20:24 2019 -0500 [examples] Add a complete Nifi quickstart example This patchs adds a brief example using Apache Nifi to ingest data into Apache Kudu. Change-Id: I71f3bc5898c15d7bc19cffb3a91b9efac3f6928b Reviewed-on: http://gerrit.cloudera.org:8080/13878 Tested-by: Grant Henke <[email protected]> Reviewed-by: Andrew Wong <[email protected]> --- docs/quickstart.adoc | 12 +- examples/quickstart/nifi/README.adoc | 165 ++++ examples/quickstart/nifi/Random_User_Kudu.xml | 1002 +++++++++++++++++++++++++ examples/quickstart/spark/README.adoc | 2 +- 4 files changed, 1174 insertions(+), 7 deletions(-) diff --git a/docs/quickstart.adoc b/docs/quickstart.adoc index e02506b..46e06a9 100644 --- a/docs/quickstart.adoc +++ b/docs/quickstart.adoc @@ -30,7 +30,7 @@ Follow these instructions to set up and run a local Kudu Cluster using Docker, and get started using Apache Kudu in minutes. -Note: This is intended for demonstration purposes only and shouldn't +NOTE: This is intended for demonstration purposes only and shouldn't be used for production or performance/scale testing. [[quickstart_vm]] @@ -48,8 +48,8 @@ Clone the Apache Kudu repository using Git and change to the `kudu` directory: [source,bash] ---- -$ git clone https://github.com/apache/kudu -$ cd kudu +git clone https://github.com/apache/kudu +cd kudu ---- == Start the Quickstart Cluster @@ -60,7 +60,7 @@ Set the `KUDU_QUICKSTART_IP` environment variable to your ip address: [source,bash] ---- -$ export KUDU_QUICKSTART_IP=$(ifconfig | grep "inet " | grep -Fv 127.0.0.1 | awk '{print $2}' | tail -1) +export KUDU_QUICKSTART_IP=$(ifconfig | grep "inet " | grep -Fv 127.0.0.1 | awk '{print $2}' | tail -1) ---- === Bring up the Cluster @@ -75,7 +75,7 @@ you can specify the master addresses with `localhost:7051,localhost:7151,localho docker-compose -f docker/quickstart.yml up ---- -Note: You can include the `-d` flag to run the cluster in the background. +NOTE: You can include the `-d` flag to run the cluster in the background. === View the Web-UI @@ -106,7 +106,7 @@ export KUDU_USER_NAME=kudu kudu cluster ksck localhost:7051,localhost:7151,localhost:7251 ---- -Note: Setting `KUDU_USER_NAME=kudu` simplifies using Kudu from various user +NOTE: Setting `KUDU_USER_NAME=kudu` simplifies using Kudu from various user accounts in a non-secure environment. == Running a Brief Example diff --git a/examples/quickstart/nifi/README.adoc b/examples/quickstart/nifi/README.adoc new file mode 100644 index 0000000..3d4e168 --- /dev/null +++ b/examples/quickstart/nifi/README.adoc @@ -0,0 +1,165 @@ += Apache NiFi Quickstart + +Below is a brief example using Apache NiFi to ingest data in Apache Kudu. + +== Start the Kudu Quickstart Environment + +See the Apache Kudu +link:https://kudu.apache.org/docs/quickstart.html[quickstart documentation] +to setup and run the Kudu quickstart environment. + +== Run Apache NiFi + +Use the following command to run the latest Apache NiFi Docker image: + +[source,bash] +---- +docker run --name kudu-nifi --network="docker_default" -p 8080:8080 apache/nifi:latest +---- + +You can view the running NiFi instance at link:http://localhost:8080/nifi[localhost:8080/nifi]. + +NOTE: `--network="docker_default"` is specified to connect the container the +same network as the quickstart cluster. + +NOTE: You can include the `-d` flag to run the cluster in the background. + +== Create the Kudu table + +Create the `random_user` Kudu table that matches the expected Schema. + +In order to do this without any dependencies on your host machine, we will +use the `jshell` REPL in a Docker container to create the table using the +Java API. First setup the Docker container, download the jar, and run the REPL: + +[source,bash] +---- +docker run -it --rm --network="docker_default" maven:latest bin/bash +# Download the kudu-client-tools jar which has the kudu-client and all the dependencies. +mkdir jars +mvn dependency:copy \ + -Dartifact=org.apache.kudu:kudu-client-tools:1.10.0 \ + -DoutputDirectory=jars +# Run the jshell with the jar on the classpath. +jshell --class-path jars/* +---- + +NOTE: `--network="docker_default"` is specified to connect the container the +same network as the quickstart cluster. + +Then, once in the `jshell` REPL, create the table using the Java API: + +[source,java] +---- +import org.apache.kudu.client.CreateTableOptions +import org.apache.kudu.client.KuduClient +import org.apache.kudu.client.KuduClient.KuduClientBuilder +import org.apache.kudu.ColumnSchema.ColumnSchemaBuilder +import org.apache.kudu.Schema +import org.apache.kudu.Type + +KuduClient client = + new KuduClientBuilder("kudu-master-1:7051,kudu-master-2:7151,kudu-master-3:7251").build(); + +if(client.tableExists("random_user")) { + client.deleteTable("random_user"); +} + +Schema schema = new Schema(Arrays.asList( + new ColumnSchemaBuilder("ssn", Type.STRING).key(true).build(), + new ColumnSchemaBuilder("firstName", Type.STRING).build(), + new ColumnSchemaBuilder("lastName", Type.STRING).build(), + new ColumnSchemaBuilder("email", Type.STRING).build()) +); +CreateTableOptions tableOptions = + new CreateTableOptions().setNumReplicas(3).addHashPartitions(Arrays.asList("ssn"), 4); +client.createTable("random_user", schema, tableOptions); +---- + +Once complete, you can use `CTRL + D` to exit the REPL and `exit` to exit the container. + +== Load the Dataflow Template + +The `Random_User_Kudu.xml` template downloads randomly generated user data from +http://randomuser.me and then pushes the data into Kudu. The data is pulled in +100 records at a time and then split into individual records. The incoming data +is in JSON Format. + +Next, the user's social security number, first name, last name, and e-mail +address are extract from the JSON into FlowFile Attributes and the content is +modified to become a new JSON document consisting of only 4 fields: +`ssn`, `firstName`, `lastName`, and `email`. Finally, this smaller JSON is then pushed to +Kudu as a single row, each field being a separate column in that row. + +To load the template follow the NiFi +link:https://nifi.apache.org/docs/nifi-docs/html/user-guide.html#Import_Template["Importing a Template" documentation] +to load `Random_User_Kudu.xml`. + +Then follow the NiFi +link:hhttps://nifi.apache.org/docs/nifi-docs/html/user-guide.html#instantiating-a-template["Instantiating a Template" documentation] +to add the `Random User Kudu` template to the canvas. + +Once the template is added to the canvas you need to start the JsonTreeReader +controller service. You can do this via the PutKudu processor configuration +or via the Nifi Flow configuration in the Operate panel. See the Nifi +link:https://nifi.apache.org/docs/nifi-docs/html/user-guide.html#Controller_Services_for_Dataflows["Controller Service" documentation] +for more details. + +Now you can start individual processors by right-clicking each processor and selecting `Start`. +You can also explore the configuration, queue contents, and more by right-clicking on each element. +Alternatively you can use the Operate panel and start the entire flow at once. +More about starting and stopping NiFi components can be read in the NiFi +link:https://nifi.apache.org/docs/nifi-docs/html/user-guide.html#starting-a-component["Starting a Component" documentation]. + +== Shutdown NiFi + +Once you are done with the NiFi container you can shutdown in a couple of ways. +If you ran NiFi without the `-d` flag, you can use `ctrl + c` to stop the container. + +If you ran NiFi with the `-d` flag, you can use the following to +gracefully shutdown the cluster: + +[source,bash] +---- +docker stop kudu-nifi +---- + +To permanently remove the container run the following: + +[source,bash] +---- +docker rm kudu-nifi +---- + +== Next steps + +The above example showed how to ingest data into Kudu using Apache NiFi. +Next explore the other quickstart guides to learn how to query or process +the data using other tools. + +For example, the link:https://github.com/apache/kudu/tree/master/examples/quickstart/spark[Spark quickstart guide] +will walk you through how to setup and query Kudu tables with the `spark-kudu` +integration. + +If you have already run through the Spark quickstart the following is a brief +example of the code to allow you to query the `random_user` table: + +[source,bash] +---- +spark-shell --packages org.apache.kudu:kudu-spark2_2.11:1.10.0 +---- + +[source,scala] +---- +:paste +val random_user = spark.read + .option("kudu.master", "localhost:7051,localhost:7151,localhost:7251") + .option("kudu.table", "random_user") + // We need to use leader_only because Kudu on Docker currently doesn't + // support Snapshot scans due to `--use_hybrid_clock=false`. + .option("kudu.scanLocality", "leader_only") + .format("kudu").load +random_user.createOrReplaceTempView("random_user") +spark.sql("SELECT count(*) FROM random_user").show() +spark.sql("SELECT * FROM random_user LIMIT 5").show() +---- diff --git a/examples/quickstart/nifi/Random_User_Kudu.xml b/examples/quickstart/nifi/Random_User_Kudu.xml new file mode 100644 index 0000000..158992a --- /dev/null +++ b/examples/quickstart/nifi/Random_User_Kudu.xml @@ -0,0 +1,1002 @@ +<?xml version="1.0" encoding="UTF-8" standalone="yes"?> +<template encoding-version="1.2"> + <description>This template downloads randomly generated user data from +http://randomuser.me and then pushes the data into Kudu. The data is pulled in +100 records at a time and then split into individual records. The incoming data +is in JSON Format. + +Next, the user's social security number, first name, last name, and e-mail +address are extract from the JSON into FlowFile Attributes and the content is +modified to become a new JSON document consisting of only 4 fields: +ssn, firstName, lastName, email. Finally, this smaller JSON is then pushed to +Kudu as a single row, each value being a separate column in that row.</description> + <groupId>00304107-016c-1000-2e69-8f2347fbf5c3</groupId> + <name>Random User Kudu</name> + <snippet> + <connections> + <id>2ebb7ae0-bb19-386d-0000-000000000000</id> + <parentGroupId>3d044f5c-470e-393b-0000-000000000000</parentGroupId> + <backPressureDataSizeThreshold>1 GB</backPressureDataSizeThreshold> + <backPressureObjectThreshold>10000</backPressureObjectThreshold> + <bends> + <x>469.6021968790567</x> + <y>1017.9549013717346</y> + </bends> + <bends> + <x>469.6021968790567</x> + <y>1067.9549013717346</y> + </bends> + <destination> + <groupId>3d044f5c-470e-393b-0000-000000000000</groupId> + <id>d18d7c78-8767-35c5-0000-000000000000</id> + <type>PROCESSOR</type> + </destination> + <flowFileExpiration>0 sec</flowFileExpiration> + <labelIndex>1</labelIndex> + <loadBalanceCompression>DO_NOT_COMPRESS</loadBalanceCompression> + <loadBalancePartitionAttribute></loadBalancePartitionAttribute> + <loadBalanceStatus>LOAD_BALANCE_NOT_CONFIGURED</loadBalanceStatus> + <loadBalanceStrategy>DO_NOT_LOAD_BALANCE</loadBalanceStrategy> + <name></name> + <selectedRelationships>failure</selectedRelationships> + <source> + <groupId>3d044f5c-470e-393b-0000-000000000000</groupId> + <id>d18d7c78-8767-35c5-0000-000000000000</id> + <type>PROCESSOR</type> + </source> + <zIndex>0</zIndex> + </connections> + <connections> + <id>7400e70c-689c-353f-0000-000000000000</id> + <parentGroupId>3d044f5c-470e-393b-0000-000000000000</parentGroupId> + <backPressureDataSizeThreshold>0 MB</backPressureDataSizeThreshold> + <backPressureObjectThreshold>0</backPressureObjectThreshold> + <destination> + <groupId>3d044f5c-470e-393b-0000-000000000000</groupId> + <id>61b913f5-e84d-33c4-0000-000000000000</id> + <type>PROCESSOR</type> + </destination> + <flowFileExpiration>0 sec</flowFileExpiration> + <labelIndex>1</labelIndex> + <loadBalanceCompression>DO_NOT_COMPRESS</loadBalanceCompression> + <loadBalanceStatus>LOAD_BALANCE_NOT_CONFIGURED</loadBalanceStatus> + <loadBalanceStrategy>DO_NOT_LOAD_BALANCE</loadBalanceStrategy> + <name></name> + <selectedRelationships>split</selectedRelationships> + <source> + <groupId>3d044f5c-470e-393b-0000-000000000000</groupId> + <id>1f4acd0d-2480-38ea-0000-000000000000</id> + <type>PROCESSOR</type> + </source> + <zIndex>0</zIndex> + </connections> + <connections> + <id>786748c8-7a7c-3dd4-0000-000000000000</id> + <parentGroupId>3d044f5c-470e-393b-0000-000000000000</parentGroupId> + <backPressureDataSizeThreshold>0 MB</backPressureDataSizeThreshold> + <backPressureObjectThreshold>0</backPressureObjectThreshold> + <bends> + <x>173.46475219726562</x> + <y>179.42988967895508</y> + </bends> + <destination> + <groupId>3d044f5c-470e-393b-0000-000000000000</groupId> + <id>1f4acd0d-2480-38ea-0000-000000000000</id> + <type>PROCESSOR</type> + </destination> + <flowFileExpiration>0 sec</flowFileExpiration> + <labelIndex>0</labelIndex> + <loadBalanceCompression>DO_NOT_COMPRESS</loadBalanceCompression> + <loadBalanceStatus>LOAD_BALANCE_NOT_CONFIGURED</loadBalanceStatus> + <loadBalanceStrategy>DO_NOT_LOAD_BALANCE</loadBalanceStrategy> + <name></name> + <selectedRelationships>Response</selectedRelationships> + <source> + <groupId>3d044f5c-470e-393b-0000-000000000000</groupId> + <id>6ada961c-399a-30dd-0000-000000000000</id> + <type>PROCESSOR</type> + </source> + <zIndex>0</zIndex> + </connections> + <connections> + <id>91e420fd-87d5-39e6-0000-000000000000</id> + <parentGroupId>3d044f5c-470e-393b-0000-000000000000</parentGroupId> + <backPressureDataSizeThreshold>0 MB</backPressureDataSizeThreshold> + <backPressureObjectThreshold>0</backPressureObjectThreshold> + <destination> + <groupId>3d044f5c-470e-393b-0000-000000000000</groupId> + <id>d18d7c78-8767-35c5-0000-000000000000</id> + <type>PROCESSOR</type> + </destination> + <flowFileExpiration>0 sec</flowFileExpiration> + <labelIndex>1</labelIndex> + <loadBalanceCompression>DO_NOT_COMPRESS</loadBalanceCompression> + <loadBalanceStatus>LOAD_BALANCE_NOT_CONFIGURED</loadBalanceStatus> + <loadBalanceStrategy>DO_NOT_LOAD_BALANCE</loadBalanceStrategy> + <name></name> + <selectedRelationships>failure</selectedRelationships> + <selectedRelationships>success</selectedRelationships> + <source> + <groupId>3d044f5c-470e-393b-0000-000000000000</groupId> + <id>5946c6a3-44fa-3784-0000-000000000000</id> + <type>PROCESSOR</type> + </source> + <zIndex>0</zIndex> + </connections> + <connections> + <id>c518dc9b-e66c-3664-0000-000000000000</id> + <parentGroupId>3d044f5c-470e-393b-0000-000000000000</parentGroupId> + <backPressureDataSizeThreshold>0 MB</backPressureDataSizeThreshold> + <backPressureObjectThreshold>0</backPressureObjectThreshold> + <destination> + <groupId>3d044f5c-470e-393b-0000-000000000000</groupId> + <id>5946c6a3-44fa-3784-0000-000000000000</id> + <type>PROCESSOR</type> + </destination> + <flowFileExpiration>0 sec</flowFileExpiration> + <labelIndex>1</labelIndex> + <loadBalanceCompression>DO_NOT_COMPRESS</loadBalanceCompression> + <loadBalanceStatus>LOAD_BALANCE_NOT_CONFIGURED</loadBalanceStatus> + <loadBalanceStrategy>DO_NOT_LOAD_BALANCE</loadBalanceStrategy> + <name></name> + <selectedRelationships>matched</selectedRelationships> + <source> + <groupId>3d044f5c-470e-393b-0000-000000000000</groupId> + <id>61b913f5-e84d-33c4-0000-000000000000</id> + <type>PROCESSOR</type> + </source> + <zIndex>0</zIndex> + </connections> + <controllerServices> + <id>d8092989-d6ef-3313-0000-000000000000</id> + <parentGroupId>3d044f5c-470e-393b-0000-000000000000</parentGroupId> + <bundle> + <artifact>nifi-record-serialization-services-nar</artifact> + <group>org.apache.nifi</group> + <version>1.9.2</version> + </bundle> + <descriptors> + <entry> + <key>schema-access-strategy</key> + <value> + <name>schema-access-strategy</name> + </value> + </entry> + <entry> + <key>schema-registry</key> + <value> + <identifiesControllerService>org.apache.nifi.schemaregistry.services.SchemaRegistry</identifiesControllerService> + <name>schema-registry</name> + </value> + </entry> + <entry> + <key>schema-name</key> + <value> + <name>schema-name</name> + </value> + </entry> + <entry> + <key>schema-version</key> + <value> + <name>schema-version</name> + </value> + </entry> + <entry> + <key>schema-branch</key> + <value> + <name>schema-branch</name> + </value> + </entry> + <entry> + <key>schema-text</key> + <value> + <name>schema-text</name> + </value> + </entry> + <entry> + <key>schema-inference-cache</key> + <value> + <identifiesControllerService>org.apache.nifi.serialization.RecordSchemaCacheService</identifiesControllerService> + <name>schema-inference-cache</name> + </value> + </entry> + <entry> + <key>Date Format</key> + <value> + <name>Date Format</name> + </value> + </entry> + <entry> + <key>Time Format</key> + <value> + <name>Time Format</name> + </value> + </entry> + <entry> + <key>Timestamp Format</key> + <value> + <name>Timestamp Format</name> + </value> + </entry> + </descriptors> + <name>JsonTreeReader</name> + <persistsState>false</persistsState> + <properties> + <entry> + <key>schema-access-strategy</key> + </entry> + <entry> + <key>schema-registry</key> + </entry> + <entry> + <key>schema-name</key> + </entry> + <entry> + <key>schema-version</key> + </entry> + <entry> + <key>schema-branch</key> + </entry> + <entry> + <key>schema-text</key> + </entry> + <entry> + <key>schema-inference-cache</key> + </entry> + <entry> + <key>Date Format</key> + </entry> + <entry> + <key>Time Format</key> + </entry> + <entry> + <key>Timestamp Format</key> + </entry> + </properties> + <state>ENABLED</state> + <type>org.apache.nifi.json.JsonTreeReader</type> + </controllerServices> + <processors> + <id>1f4acd0d-2480-38ea-0000-000000000000</id> + <parentGroupId>3d044f5c-470e-393b-0000-000000000000</parentGroupId> + <position> + <x>5.00505561901673</x> + <y>268.45753564705933</y> + </position> + <bundle> + <artifact>nifi-standard-nar</artifact> + <group>org.apache.nifi</group> + <version>1.9.2</version> + </bundle> + <config> + <bulletinLevel>WARN</bulletinLevel> + <comments></comments> + <concurrentlySchedulableTaskCount>1</concurrentlySchedulableTaskCount> + <descriptors> + <entry> + <key>JsonPath Expression</key> + <value> + <name>JsonPath Expression</name> + </value> + </entry> + <entry> + <key>Null Value Representation</key> + <value> + <name>Null Value Representation</name> + </value> + </entry> + </descriptors> + <executionNode>ALL</executionNode> + <lossTolerant>false</lossTolerant> + <penaltyDuration>30 sec</penaltyDuration> + <properties> + <entry> + <key>JsonPath Expression</key> + <value>$.results[*]</value> + </entry> + <entry> + <key>Null Value Representation</key> + <value>empty string</value> + </entry> + </properties> + <runDurationMillis>0</runDurationMillis> + <schedulingPeriod>0 sec</schedulingPeriod> + <schedulingStrategy>TIMER_DRIVEN</schedulingStrategy> + <yieldDuration>1 sec</yieldDuration> + </config> + <executionNodeRestricted>false</executionNodeRestricted> + <name>SplitJson</name> + <relationships> + <autoTerminate>true</autoTerminate> + <name>failure</name> + </relationships> + <relationships> + <autoTerminate>true</autoTerminate> + <name>original</name> + </relationships> + <relationships> + <autoTerminate>false</autoTerminate> + <name>split</name> + </relationships> + <state>STOPPED</state> + <style/> + <type>org.apache.nifi.processors.standard.SplitJson</type> + </processors> + <processors> + <id>5946c6a3-44fa-3784-0000-000000000000</id> + <parentGroupId>3d044f5c-470e-393b-0000-000000000000</parentGroupId> + <position> + <x>5.00036773572856</x> + <y>744.0256629035371</y> + </position> + <bundle> + <artifact>nifi-standard-nar</artifact> + <group>org.apache.nifi</group> + <version>1.9.2</version> + </bundle> + <config> + <bulletinLevel>WARN</bulletinLevel> + <comments></comments> + <concurrentlySchedulableTaskCount>1</concurrentlySchedulableTaskCount> + <descriptors> + <entry> + <key>Attributes List</key> + <value> + <name>Attributes List</name> + </value> + </entry> + <entry> + <key>attributes-to-json-regex</key> + <value> + <name>attributes-to-json-regex</name> + </value> + </entry> + <entry> + <key>Destination</key> + <value> + <name>Destination</name> + </value> + </entry> + <entry> + <key>Include Core Attributes</key> + <value> + <name>Include Core Attributes</name> + </value> + </entry> + <entry> + <key>Null Value</key> + <value> + <name>Null Value</name> + </value> + </entry> + </descriptors> + <executionNode>ALL</executionNode> + <lossTolerant>false</lossTolerant> + <penaltyDuration>30 sec</penaltyDuration> + <properties> + <entry> + <key>Attributes List</key> + <value>ssn, firstName, lastName, email</value> + </entry> + <entry> + <key>attributes-to-json-regex</key> + </entry> + <entry> + <key>Destination</key> + <value>flowfile-content</value> + </entry> + <entry> + <key>Include Core Attributes</key> + <value>true</value> + </entry> + <entry> + <key>Null Value</key> + <value>false</value> + </entry> + </properties> + <runDurationMillis>0</runDurationMillis> + <schedulingPeriod>0 sec</schedulingPeriod> + <schedulingStrategy>TIMER_DRIVEN</schedulingStrategy> + <yieldDuration>1 sec</yieldDuration> + </config> + <executionNodeRestricted>false</executionNodeRestricted> + <name>AttributesToJSON</name> + <relationships> + <autoTerminate>false</autoTerminate> + <name>failure</name> + </relationships> + <relationships> + <autoTerminate>false</autoTerminate> + <name>success</name> + </relationships> + <state>STOPPED</state> + <style/> + <type>org.apache.nifi.processors.standard.AttributesToJSON</type> + </processors> + <processors> + <id>61b913f5-e84d-33c4-0000-000000000000</id> + <parentGroupId>3d044f5c-470e-393b-0000-000000000000</parentGroupId> + <position> + <x>6.4349386870426315</x> + <y>504.31885574631224</y> + </position> + <bundle> + <artifact>nifi-standard-nar</artifact> + <group>org.apache.nifi</group> + <version>1.9.2</version> + </bundle> + <config> + <bulletinLevel>WARN</bulletinLevel> + <comments></comments> + <concurrentlySchedulableTaskCount>1</concurrentlySchedulableTaskCount> + <descriptors> + <entry> + <key>Destination</key> + <value> + <name>Destination</name> + </value> + </entry> + <entry> + <key>Return Type</key> + <value> + <name>Return Type</name> + </value> + </entry> + <entry> + <key>Path Not Found Behavior</key> + <value> + <name>Path Not Found Behavior</name> + </value> + </entry> + <entry> + <key>Null Value Representation</key> + <value> + <name>Null Value Representation</name> + </value> + </entry> + <entry> + <key>email</key> + <value> + <name>email</name> + </value> + </entry> + <entry> + <key>firstName</key> + <value> + <name>firstName</name> + </value> + </entry> + <entry> + <key>lastName</key> + <value> + <name>lastName</name> + </value> + </entry> + <entry> + <key>ssn</key> + <value> + <name>ssn</name> + </value> + </entry> + </descriptors> + <executionNode>ALL</executionNode> + <lossTolerant>false</lossTolerant> + <penaltyDuration>30 sec</penaltyDuration> + <properties> + <entry> + <key>Destination</key> + <value>flowfile-attribute</value> + </entry> + <entry> + <key>Return Type</key> + <value>auto-detect</value> + </entry> + <entry> + <key>Path Not Found Behavior</key> + <value>ignore</value> + </entry> + <entry> + <key>Null Value Representation</key> + <value>empty string</value> + </entry> + <entry> + <key>email</key> + <value>$.email</value> + </entry> + <entry> + <key>firstName</key> + <value>$.name.first</value> + </entry> + <entry> + <key>lastName</key> + <value>$.name.last</value> + </entry> + <entry> + <key>ssn</key> + <value>$.id.value</value> + </entry> + </properties> + <runDurationMillis>0</runDurationMillis> + <schedulingPeriod>0 sec</schedulingPeriod> + <schedulingStrategy>TIMER_DRIVEN</schedulingStrategy> + <yieldDuration>1 sec</yieldDuration> + </config> + <executionNodeRestricted>false</executionNodeRestricted> + <name>EvaluateJsonPath</name> + <relationships> + <autoTerminate>true</autoTerminate> + <name>failure</name> + </relationships> + <relationships> + <autoTerminate>false</autoTerminate> + <name>matched</name> + </relationships> + <relationships> + <autoTerminate>true</autoTerminate> + <name>unmatched</name> + </relationships> + <state>STOPPED</state> + <style/> + <type>org.apache.nifi.processors.standard.EvaluateJsonPath</type> + </processors> + <processors> + <id>6ada961c-399a-30dd-0000-000000000000</id> + <parentGroupId>3d044f5c-470e-393b-0000-000000000000</parentGroupId> + <position> + <x>0.0</x> + <y>0.0</y> + </position> + <bundle> + <artifact>nifi-standard-nar</artifact> + <group>org.apache.nifi</group> + <version>1.9.2</version> + </bundle> + <config> + <bulletinLevel>WARN</bulletinLevel> + <comments></comments> + <concurrentlySchedulableTaskCount>1</concurrentlySchedulableTaskCount> + <descriptors> + <entry> + <key>HTTP Method</key> + <value> + <name>HTTP Method</name> + </value> + </entry> + <entry> + <key>Remote URL</key> + <value> + <name>Remote URL</name> + </value> + </entry> + <entry> + <key>SSL Context Service</key> + <value> + <identifiesControllerService>org.apache.nifi.ssl.SSLContextService</identifiesControllerService> + <name>SSL Context Service</name> + </value> + </entry> + <entry> + <key>Connection Timeout</key> + <value> + <name>Connection Timeout</name> + </value> + </entry> + <entry> + <key>Read Timeout</key> + <value> + <name>Read Timeout</name> + </value> + </entry> + <entry> + <key>Include Date Header</key> + <value> + <name>Include Date Header</name> + </value> + </entry> + <entry> + <key>Follow Redirects</key> + <value> + <name>Follow Redirects</name> + </value> + </entry> + <entry> + <key>Attributes to Send</key> + <value> + <name>Attributes to Send</name> + </value> + </entry> + <entry> + <key>Basic Authentication Username</key> + <value> + <name>Basic Authentication Username</name> + </value> + </entry> + <entry> + <key>Basic Authentication Password</key> + <value> + <name>Basic Authentication Password</name> + </value> + </entry> + <entry> + <key>proxy-configuration-service</key> + <value> + <identifiesControllerService>org.apache.nifi.proxy.ProxyConfigurationService</identifiesControllerService> + <name>proxy-configuration-service</name> + </value> + </entry> + <entry> + <key>Proxy Host</key> + <value> + <name>Proxy Host</name> + </value> + </entry> + <entry> + <key>Proxy Port</key> + <value> + <name>Proxy Port</name> + </value> + </entry> + <entry> + <key>Proxy Type</key> + <value> + <name>Proxy Type</name> + </value> + </entry> + <entry> + <key>invokehttp-proxy-user</key> + <value> + <name>invokehttp-proxy-user</name> + </value> + </entry> + <entry> + <key>invokehttp-proxy-password</key> + <value> + <name>invokehttp-proxy-password</name> + </value> + </entry> + <entry> + <key>Put Response Body In Attribute</key> + <value> + <name>Put Response Body In Attribute</name> + </value> + </entry> + <entry> + <key>Max Length To Put In Attribute</key> + <value> + <name>Max Length To Put In Attribute</name> + </value> + </entry> + <entry> + <key>Digest Authentication</key> + <value> + <name>Digest Authentication</name> + </value> + </entry> + <entry> + <key>Always Output Response</key> + <value> + <name>Always Output Response</name> + </value> + </entry> + <entry> + <key>Trusted Hostname</key> + <value> + <name>Trusted Hostname</name> + </value> + </entry> + <entry> + <key>Add Response Headers to Request</key> + <value> + <name>Add Response Headers to Request</name> + </value> + </entry> + <entry> + <key>Content-Type</key> + <value> + <name>Content-Type</name> + </value> + </entry> + <entry> + <key>send-message-body</key> + <value> + <name>send-message-body</name> + </value> + </entry> + <entry> + <key>Use Chunked Encoding</key> + <value> + <name>Use Chunked Encoding</name> + </value> + </entry> + <entry> + <key>Penalize on "No Retry"</key> + <value> + <name>Penalize on "No Retry"</name> + </value> + </entry> + <entry> + <key>use-etag</key> + <value> + <name>use-etag</name> + </value> + </entry> + <entry> + <key>etag-max-cache-size</key> + <value> + <name>etag-max-cache-size</name> + </value> + </entry> + </descriptors> + <executionNode>ALL</executionNode> + <lossTolerant>false</lossTolerant> + <penaltyDuration>30 sec</penaltyDuration> + <properties> + <entry> + <key>HTTP Method</key> + <value>GET</value> + </entry> + <entry> + <key>Remote URL</key> + <value>http://api.randomuser.me?nat=us&results=100</value> + </entry> + <entry> + <key>SSL Context Service</key> + </entry> + <entry> + <key>Connection Timeout</key> + <value>5 secs</value> + </entry> + <entry> + <key>Read Timeout</key> + <value>15 secs</value> + </entry> + <entry> + <key>Include Date Header</key> + <value>True</value> + </entry> + <entry> + <key>Follow Redirects</key> + <value>True</value> + </entry> + <entry> + <key>Attributes to Send</key> + </entry> + <entry> + <key>Basic Authentication Username</key> + </entry> + <entry> + <key>Basic Authentication Password</key> + </entry> + <entry> + <key>proxy-configuration-service</key> + </entry> + <entry> + <key>Proxy Host</key> + </entry> + <entry> + <key>Proxy Port</key> + </entry> + <entry> + <key>Proxy Type</key> + <value>http</value> + </entry> + <entry> + <key>invokehttp-proxy-user</key> + </entry> + <entry> + <key>invokehttp-proxy-password</key> + </entry> + <entry> + <key>Put Response Body In Attribute</key> + </entry> + <entry> + <key>Max Length To Put In Attribute</key> + <value>256</value> + </entry> + <entry> + <key>Digest Authentication</key> + <value>false</value> + </entry> + <entry> + <key>Always Output Response</key> + <value>false</value> + </entry> + <entry> + <key>Trusted Hostname</key> + </entry> + <entry> + <key>Add Response Headers to Request</key> + <value>false</value> + </entry> + <entry> + <key>Content-Type</key> + <value>${mime.type}</value> + </entry> + <entry> + <key>send-message-body</key> + <value>true</value> + </entry> + <entry> + <key>Use Chunked Encoding</key> + <value>false</value> + </entry> + <entry> + <key>Penalize on "No Retry"</key> + <value>false</value> + </entry> + <entry> + <key>use-etag</key> + <value>false</value> + </entry> + <entry> + <key>etag-max-cache-size</key> + <value>10MB</value> + </entry> + </properties> + <runDurationMillis>0</runDurationMillis> + <schedulingPeriod>10 seconds</schedulingPeriod> + <schedulingStrategy>TIMER_DRIVEN</schedulingStrategy> + <yieldDuration>1 sec</yieldDuration> + </config> + <executionNodeRestricted>false</executionNodeRestricted> + <name>Fetch User Data</name> + <relationships> + <autoTerminate>true</autoTerminate> + <name>Failure</name> + </relationships> + <relationships> + <autoTerminate>true</autoTerminate> + <name>No Retry</name> + </relationships> + <relationships> + <autoTerminate>true</autoTerminate> + <name>Original</name> + </relationships> + <relationships> + <autoTerminate>false</autoTerminate> + <name>Response</name> + </relationships> + <relationships> + <autoTerminate>true</autoTerminate> + <name>Retry</name> + </relationships> + <state>STOPPED</state> + <style/> + <type>org.apache.nifi.processors.standard.InvokeHTTP</type> + </processors> + <processors> + <id>d18d7c78-8767-35c5-0000-000000000000</id> + <parentGroupId>3d044f5c-470e-393b-0000-000000000000</parentGroupId> + <position> + <x>6.6021968790567485</x> + <y>977.9549013717346</y> + </position> + <bundle> + <artifact>nifi-kudu-nar</artifact> + <group>org.apache.nifi</group> + <version>1.9.2</version> + </bundle> + <config> + <bulletinLevel>WARN</bulletinLevel> + <comments></comments> + <concurrentlySchedulableTaskCount>1</concurrentlySchedulableTaskCount> + <descriptors> + <entry> + <key>Kudu Masters</key> + <value> + <name>Kudu Masters</name> + </value> + </entry> + <entry> + <key>Table Name</key> + <value> + <name>Table Name</name> + </value> + </entry> + <entry> + <key>kerberos-credentials-service</key> + <value> + <identifiesControllerService>org.apache.nifi.kerberos.KerberosCredentialsService</identifiesControllerService> + <name>kerberos-credentials-service</name> + </value> + </entry> + <entry> + <key>Skip head line</key> + <value> + <name>Skip head line</name> + </value> + </entry> + <entry> + <key>record-reader</key> + <value> + <identifiesControllerService>org.apache.nifi.serialization.RecordReaderFactory</identifiesControllerService> + <name>record-reader</name> + </value> + </entry> + <entry> + <key>Insert Operation</key> + <value> + <name>Insert Operation</name> + </value> + </entry> + <entry> + <key>Flush Mode</key> + <value> + <name>Flush Mode</name> + </value> + </entry> + <entry> + <key>FlowFiles per Batch</key> + <value> + <name>FlowFiles per Batch</name> + </value> + </entry> + <entry> + <key>Batch Size</key> + <value> + <name>Batch Size</name> + </value> + </entry> + </descriptors> + <executionNode>ALL</executionNode> + <lossTolerant>false</lossTolerant> + <penaltyDuration>30 sec</penaltyDuration> + <properties> + <entry> + <key>Kudu Masters</key> + <value>kudu-master-1:7051,kudu-master-2:7151,kudu-master-3:7251</value> + </entry> + <entry> + <key>Table Name</key> + <value>random_user</value> + </entry> + <entry> + <key>kerberos-credentials-service</key> + </entry> + <entry> + <key>Skip head line</key> + <value>false</value> + </entry> + <entry> + <key>record-reader</key> + <value>d8092989-d6ef-3313-0000-000000000000</value> + </entry> + <entry> + <key>Insert Operation</key> + <value>UPSERT</value> + </entry> + <entry> + <key>Flush Mode</key> + <value>AUTO_FLUSH_BACKGROUND</value> + </entry> + <entry> + <key>FlowFiles per Batch</key> + <value>1</value> + </entry> + <entry> + <key>Batch Size</key> + <value>100</value> + </entry> + </properties> + <runDurationMillis>0</runDurationMillis> + <schedulingPeriod>0 sec</schedulingPeriod> + <schedulingStrategy>TIMER_DRIVEN</schedulingStrategy> + <yieldDuration>1 sec</yieldDuration> + </config> + <executionNodeRestricted>false</executionNodeRestricted> + <name>PutKudu</name> + <relationships> + <autoTerminate>false</autoTerminate> + <name>failure</name> + </relationships> + <relationships> + <autoTerminate>true</autoTerminate> + <name>success</name> + </relationships> + <state>STOPPED</state> + <style/> + <type>org.apache.nifi.processors.kudu.PutKudu</type> + </processors> + </snippet> + <timestamp>07/18/2019 14:13:34 UTC</timestamp> +</template> diff --git a/examples/quickstart/spark/README.adoc b/examples/quickstart/spark/README.adoc index 42953fe..b7ec637 100644 --- a/examples/quickstart/spark/README.adoc +++ b/examples/quickstart/spark/README.adoc @@ -3,7 +3,7 @@ Below is a brief example using Apache Spark to load, query, and modify a real data set in Apache Kudu. -== Start the Kudu Quickstart +== Start the Kudu Quickstart Environment See the Apache Kudu link:https://kudu.apache.org/docs/quickstart.html[quickstart documentation]
