This is an automated email from the ASF dual-hosted git repository. mcvsubbu pushed a commit to branch 0.2.0 in repository https://gitbox.apache.org/repos/asf/incubator-pinot.git
commit dd0c10d645a4bb5d118b64a9631818f6ee29aedf Author: Dominique Adapon <[email protected]> AuthorDate: Fri Oct 4 10:49:58 2019 -0700 Update Getting Started documentation. (#4615) * Update Getting Started documentation. Updated Getting Started documentation to include a CSV config file and a specific CSV file. Also updated minor grammar issues and version number. * Update Getting Started documentation Updated Getting Started documentation to include a specific CSV file and a CSV config file. Also updated minor grammar issues and created variables for version number and working directory, as well as shortened all commands by navigating to pinot-admin.sh. * Update Getting Started documentation Updated Getting Started documentation again with clearer instructions on where to store the data and config files. * Update Getting Started Documentation * Update Getting Started documentation. Cleaned up minor errors and clarified instructions. --- docs/getting_started.rst | 123 +++++++++++++++++++++++++++++++++++++---------- 1 file changed, 97 insertions(+), 26 deletions(-) diff --git a/docs/getting_started.rst b/docs/getting_started.rst index 4c3e5f6..577cf0e 100644 --- a/docs/getting_started.rst +++ b/docs/getting_started.rst @@ -41,7 +41,11 @@ Pinot requires JDK 8 or later and Apache Maven 3. #. Check out the code from GitHub (https://github.com/apache/incubator-pinot) #. With Maven installed, run ``mvn install package -DskipTests -Pbin-dist`` in the directory in which you checked out Pinot. -#. Make the generated scripts executable ``cd pinot-distribution/target/apache-pinot-incubating-<version>-SNAPSHOT-bin; chmod +x bin/*.sh`` +#. Make the generated scripts executable: + +.. code-block:: none + + cd pinot-distribution/target/apache-pinot-incubating-<version>-SNAPSHOT-bin/apache-pinot-incubating-<version>-SNAPSHOT-bin; chmod +x bin/*.sh Trying out Offline quickstart demo ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ @@ -97,10 +101,10 @@ last events that were ingested by Pinot. Experimenting with Pinot ~~~~~~~~~~~~~~~~~~~~~~~~ -Now we have a quick start Pinot cluster running locally. The below shows a step-by-step instruction on -how to add a simple table to the Pinot system, how to upload segments, and how to query it. +Now we have a quick start Pinot cluster running locally. Below are step-by-step instructions on +how to add a simple table to the Pinot system, how to upload a segment, and how to query the segment. -Suppose we have a transcript in CSV format containing students' basic info and their scores of each subject. +Suppose we have a transcript in CSV format containing students' basic info and their scores for each subject. +------------+------------+-----------+-----------+-----------+-----------+ | studentID | firstName | lastName | gender | subject | score | @@ -114,7 +118,53 @@ Suppose we have a transcript in CSV format containing students' basic info and t | 202 | Nick | Young | Male | Physics | 3.6 | +------------+------------+-----------+-----------+-----------+-----------+ -Firstly in order to set up a table, we need to specify the schema of this transcript. +When we create a CSV file, we will also need a separate CSV config JSON file. + +First, however, we will create a working directory called ``getting-started`` (in this example, it is on ``Desktop``), and create two additional directories within ``getting-started`` called ``data`` +and ``config``. + +Note that we can create a variable for the working directory called ``WORKING_DIR``. + +.. code-block:: none + + $ mkdir getting-started + $ WORKING_DIR=/Users/host1/Desktop/getting-started + $ cd $WORKING_DIR + $ mkdir getting-started/data + $ mkdir getting started/config + +We will create the transcript CSV file in ``data``, and the CSV config file in ``config``. + +.. code-block:: none + + $ touch getting-started/data/test.csv + $ touch getting-started/config/csv-record-reader-config.json + +The ``test.csv`` file should look like this, with no header line at the top: + +.. code-block:: none + + 200,Lucy,Smith,Female,Maths,3.8 + 200,Lucy,Smith,Female,English,3.5 + 201,Bob,King,Male,Maths,3.2 + 202,Nick,Young,Male,Physics,3.6 + +Instead of using a header line, we will use the CSV config JSON file ``csv-record-reader-config.json`` to specify the header: + +.. code-block:: none + + { + "header":"studentID,firstName,lastName,gender,subject,score", + "fileFormat":"CSV" + } + +In order to set up a table, we need to specify the schema of this transcript in ``transcript-schema.json``, which we will store in ``config``: + +.. code-block:: none + + $ touch getting-started/config/transcript-schema.json + +``transcript-schema.json`` should look like this: .. code-block:: none @@ -150,15 +200,24 @@ Firstly in order to set up a table, we need to specify the schema of this transc ] } -To upload the schema, we can use the command below: +To upload the schema, we can navigate to the directory in ``pinot-distribution`` that contains +``pinot-admin.sh``, and use the command below: .. code-block:: none - $ ./pinot-distribution/target/apache-pinot-incubating-0.1.0-SNAPSHOT-bin/apache-pinot-incubating-0.1.0-SNAPSHOT-bin/bin/pinot-admin.sh AddSchema -schemaFile /Users/host1/transcript-schema.json -exec - Executing command: AddSchema -controllerHost [controller_host] -controllerPort 9000 -schemaFilePath /Users/host1/transcript-schema.json -exec - Sending request: http://[controller_host]:9000/schemas to controller: [controller_host], version: 0.1.0-SNAPSHOT-2c5d42a908213122ab0ad8b7ac9524fcf390e4cb + $ VERSION=0.2.0 + $ cd ./pinot-distribution/target/apache-pinot-incubating-$VERSION-SNAPSHOT-bin/apache-pinot-incubating-$VERSION-SNAPSHOT-bin/bin + $ ./pinot-admin.sh AddSchema -schemaFile $WORKING_DIR/config/transcript-schema.json -exec + Executing command: AddSchema -controllerHost [controller_host] -controllerPort 9000 -schemaFilePath /Users/host1/Desktop/getting-started/config/transcript-schema.json -exec + Sending request: http://[controller_host]:9000/schemas to controller: [controller_host], version: 0.2.0-SNAPSHOT-68092ab9eb83af173d725ec685c22ba4eb5bacf9 -Then, we need to specify the table config which links the schema to this table: +Then, we need to specify the table config in another JSON file (also stored in ``config``), which links the schema to the table: + +.. code-block:: none + + $ touch getting-started/config/transcript-table-config.json + +``transcript-table-config.json`` should look like this: .. code-block:: none @@ -186,17 +245,29 @@ And upload the table config to Pinot cluster: .. code-block:: none - $ ./pinot-distribution/target/apache-pinot-incubating-0.1.0-SNAPSHOT-bin/apache-pinot-incubating-0.1.0-SNAPSHOT-bin/bin/pinot-admin.sh AddTable -filePath /Users/host1/transcript-table-config.json -exec - Executing command: AddTable -filePath /Users/host1/transcript-table-config.json -controllerHost [controller_host] -controllerPort 9000 -exec + $ ./pinot-admin.sh AddTable -filePath $WORKING_DIR/config/transcript-table-config.json -exec + Executing command: AddTable -filePath /Users/host1/Desktop/getting-started/config/transcript-table-config.json -controllerHost [controller_host] -controllerPort 9000 -exec {"status":"Table transcript_OFFLINE successfully added"} -In order to upload our data to Pinot cluster, we need to convert our CSV file to Pinot Segment: +At this point, the directory tree for our ``getting-started`` should look like this: + +.. code-block:: none + + |-- getting-started + |-- data + |-- test.csv + |-- config + |-- csv-record-reader-config.json + |-- transcript-schema.json + |-- transcript-table-config.json + +In order to upload our data to the Pinot cluster, we need to convert our CSV file into a Pinot Segment, which will be put in a new directory $WORKING_DIR/test2: .. code-block:: none - $ ./pinot-distribution/target/apache-pinot-incubating-0.1.0-SNAPSHOT-bin/apache-pinot-incubating-0.1.0-SNAPSHOT-bin/bin/pinot-admin.sh CreateSegment -dataDir /Users/host1/Desktop/test/ -format CSV -outDir /Users/host1/Desktop/test2/ -tableName transcript -segmentName transcript_0 -overwrite -schemaFile /Users/host1/transcript-schema.json - Executing command: CreateSegment -generatorConfigFile null -dataDir /Users/host1/Desktop/test/ -format CSV -outDir /Users/host1/Desktop/test2/ -overwrite true -tableName transcript -segmentName transcript_0 -timeColumnName null -schemaFile /Users/host1/transcript-schema.json -readerConfigFile null -enableStarTreeIndex false -starTreeIndexSpecFile null -hllSize 9 -hllColumns null -hllSuffix _hll -numThreads 1 - Accepted files: [/Users/host1/Desktop/test/Transcript.csv] + $ ./pinot-admin.sh CreateSegment -dataDir $WORKING_DIR/data -format CSV -outDir $WORKING_DIR/test2 -tableName transcript -segmentName transcript_0 -overwrite -schemaFile $WORKING_DIR/config/transcript-schema.json -readerConfigFile $WORKING_DIR/config/csv-record-reader-config.json + Executing command: CreateSegment -generatorConfigFile null -dataDir /Users/host1/Desktop/getting-started/data -format CSV -outDir /Users/host1/Desktop/getting-started/test2 -overwrite true -tableName transcript -segmentName transcript_0 -timeColumnName null -schemaFile /Users/host1/Desktop/getting-started/config/transcript-schema.json -readerConfigFile /Users/host1/Desktop/getting-started/config/csv-record-reader-config.json -enableStarTreeIndex false -starTreeIndexSpecFile null -hllS [...] + Accepted files: [file:/Users/host1/Desktop/getting-started/data/test.csv] Finished building StatsCollector! Collected stats for 4 documents Created dictionary for STRING column: studentID with cardinality: 1, max length in bytes: 4, range: null to null @@ -208,30 +279,30 @@ In order to upload our data to Pinot cluster, we need to convert our CSV file to Start building IndexCreator! Finished records indexing in IndexCreator! Finished segment seal! - Converting segment: /Users/host1/Desktop/test2/transcript_0_0 to v3 format - v3 segment location for segment: transcript_0_0 is /Users/host1/Desktop/test2/transcript_0_0/v3 - Deleting files in v1 segment directory: /Users/host1/Desktop/test2/transcript_0_0 + Converting segment: /Users/host1/Desktop/getting-started/test2/transcript_0_0 to v3 format + v3 segment location for segment: transcript_0_0 is /Users/host1/Desktop/getting-started/test2/transcript_0_0/v3 + Deleting files in v1 segment directory: /Users/host1/Desktop/getting-started/test2/transcript_0_0 Driver, record read time : 1 Driver, stats collector time : 0 Driver, indexing time : 0 -Once we have the Pinot segment, we can upload this segment to our cluster: +Once we have the Pinot Segment, we can upload it to our cluster: .. code-block:: none - $ ./pinot-distribution/target/apache-pinot-incubating-0.1.0-SNAPSHOT-bin/apache-pinot-incubating-0.1.0-SNAPSHOT-bin/bin/pinot-admin.sh UploadSegment -segmentDir /Users/host1/Desktop/test2/ + $ ./pinot-admin.sh UploadSegment -segmentDir $WORKING_DIR/test2/ Executing command: UploadSegment -controllerHost [controller_host] -controllerPort 9000 -segmentDir /Users/host1/Desktop/test2/ Compressing segment transcript_0_0 Uploading segment transcript_0_0.tar.gz - Sending request: http://[controller_host]:9000/v2/segments to controller: [controller_host], version: 0.1.0-SNAPSHOT-2c5d42a908213122ab0ad8b7ac9524fcf390e4cb + Sending request: http://[controller_host]:9000/v2/segments to controller: [controller_host], version: 0.2.0-SNAPSHOT-68092ab9eb83af173d725ec685c22ba4eb5bacf9 -You made it! Now we can query the data in Pinot: +You did it! Now we can query the data in Pinot. To get all the number of rows in the table: .. code-block:: none - $ ./pinot-distribution/target/apache-pinot-incubating-0.1.0-SNAPSHOT-bin/apache-pinot-incubating-0.1.0-SNAPSHOT-bin/bin/pinot-admin.sh PostQuery -brokerPort 8000 -query "select count(*) from transcript" + $ ./pinot-admin.sh PostQuery -brokerPort 8000 -query "select count(*) from transcript" Executing command: PostQuery -brokerHost [controller_host] -brokerPort 8000 -query select count(*) from transcript Result: {"aggregationResults":[{"function":"count_star","value":"4"}],"exceptions":[],"numServersQueried":1,"numServersResponded":1,"numSegmentsQueried":1,"numSegmentsProcessed":1,"numSegmentsMatched":1,"numDocsScanned":4,"numEntriesScannedInFilter":0,"numEntriesScannedPostFilter":0,"numGroupsLimitReached":false,"totalDocs":4,"timeUsedMs":7,"segmentStatistics":[],"traceInfo":{}} @@ -239,7 +310,7 @@ To get the average score of subject Maths: .. code-block:: none - $ ./pinot-distribution/target/apache-pinot-incubating-0.1.0-SNAPSHOT-bin/apache-pinot-incubating-0.1.0-SNAPSHOT-bin/bin/pinot-admin.sh PostQuery -brokerPort 8000 -query "select avg(score) from transcript where subject = \"Maths\"" + $ ./pinot-admin.sh PostQuery -brokerPort 8000 -query "select avg(score) from transcript where subject = \"Maths\"" Executing command: PostQuery -brokerHost [controller_host] -brokerPort 8000 -query select avg(score) from transcript where subject = "Maths" Result: {"aggregationResults":[{"function":"avg_score","value":"3.50000"}],"exceptions":[],"numServersQueried":1,"numServersResponded":1,"numSegmentsQueried":1,"numSegmentsProcessed":1,"numSegmentsMatched":1,"numDocsScanned":2,"numEntriesScannedInFilter":4,"numEntriesScannedPostFilter":2,"numGroupsLimitReached":false,"totalDocs":4,"timeUsedMs":33,"segmentStatistics":[],"traceInfo":{}} @@ -247,6 +318,6 @@ To get the average score for Lucy Smith: .. code-block:: none - $ ./pinot-distribution/target/apache-pinot-incubating-0.1.0-SNAPSHOT-bin/apache-pinot-incubating-0.1.0-SNAPSHOT-bin/bin/pinot-admin.sh PostQuery -brokerPort 8000 -query "select avg(score) from transcript where firstName = \"Lucy\" and lastName = \"Smith\"" + $ ./pinot-admin.sh PostQuery -brokerPort 8000 -query "select avg(score) from transcript where firstName = \"Lucy\" and lastName = \"Smith\"" Executing command: PostQuery -brokerHost [controller_host] -brokerPort 8000 -query select avg(score) from transcript where firstName = "Lucy" and lastName = "Smith" Result: {"aggregationResults":[{"function":"avg_score","value":"3.65000"}],"exceptions":[],"numServersQueried":1,"numServersResponded":1,"numSegmentsQueried":1,"numSegmentsProcessed":1,"numSegmentsMatched":1,"numDocsScanned":2,"numEntriesScannedInFilter":6,"numEntriesScannedPostFilter":2,"numGroupsLimitReached":false,"totalDocs":4,"timeUsedMs":67,"segmentStatistics":[],"traceInfo":{}} --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
