[incubator-pinot] 11/13: Update Getting Started documentation. (#4615)

mcvsubbu Mon, 04 Nov 2019 14:33:32 -0800

This is an automated email from the ASF dual-hosted git repository.

mcvsubbu pushed a commit to branch 0.2.0
in repository https://gitbox.apache.org/repos/asf/incubator-pinot.git


commit dd0c10d645a4bb5d118b64a9631818f6ee29aedf
Author: Dominique Adapon <[email protected]>
AuthorDate: Fri Oct 4 10:49:58 2019 -0700

    Update Getting Started documentation. (#4615)
    
    * Update Getting Started documentation.
    Updated Getting Started documentation to include a
    CSV config file and a specific CSV file. Also updated
    minor grammar issues and version number.
    
    * Update Getting Started documentation
    
    Updated Getting Started documentation to include a
    specific CSV file and a CSV config file. Also updated
    minor grammar issues and created variables for
    version number and working directory, as well as
    shortened all commands by navigating to pinot-admin.sh.
    
    * Update Getting Started documentation
    
    Updated Getting Started documentation again with
    clearer instructions on where to store the data and config files.
    
    * Update Getting Started Documentation
    
    * Update Getting Started documentation.
    
    Cleaned up minor errors and clarified instructions.
---
 docs/getting_started.rst | 123 +++++++++++++++++++++++++++++++++++++----------
 1 file changed, 97 insertions(+), 26 deletions(-)

diff --git a/docs/getting_started.rst b/docs/getting_started.rst
index 4c3e5f6..577cf0e 100644
--- a/docs/getting_started.rst
+++ b/docs/getting_started.rst
@@ -41,7 +41,11 @@ Pinot requires JDK 8 or later and Apache Maven 3.
 
 #. Check out the code from GitHub (https://github.com/apache/incubator-pinot)
 #. With Maven installed, run ``mvn install package -DskipTests -Pbin-dist`` in 
the directory in which you checked out Pinot.
-#. Make the generated scripts executable ``cd 
pinot-distribution/target/apache-pinot-incubating-<version>-SNAPSHOT-bin; chmod 
+x bin/*.sh``
+#. Make the generated scripts executable:
+
+.. code-block:: none
+
+  cd 
pinot-distribution/target/apache-pinot-incubating-<version>-SNAPSHOT-bin/apache-pinot-incubating-<version>-SNAPSHOT-bin;
 chmod +x bin/*.sh
 
 Trying out Offline quickstart demo
 ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
@@ -97,10 +101,10 @@ last events that were ingested by Pinot.
 Experimenting with Pinot
 ~~~~~~~~~~~~~~~~~~~~~~~~
 
-Now we have a quick start Pinot cluster running locally. The below shows a 
step-by-step instruction on
-how to add a simple table to the Pinot system, how to upload segments, and how 
to query it.
+Now we have a quick start Pinot cluster running locally. Below are 
step-by-step instructions on
+how to add a simple table to the Pinot system, how to upload a segment, and 
how to query the segment.
 
-Suppose we have a transcript in CSV format containing students' basic info and 
their scores of each subject.
+Suppose we have a transcript in CSV format containing students' basic info and 
their scores for each subject.
 
 +------------+------------+-----------+-----------+-----------+-----------+
 | studentID  | firstName  | lastName  |   gender  |  subject  |   score   |
@@ -114,7 +118,53 @@ Suppose we have a transcript in CSV format containing 
students' basic info and t
 |     202    |     Nick   |   Young   |    Male   |  Physics  |    3.6    |
 +------------+------------+-----------+-----------+-----------+-----------+
 
-Firstly in order to set up a table, we need to specify the schema of this 
transcript.
+When we create a CSV file, we will also need a separate CSV config JSON file.
+
+First, however, we will create a working directory called ``getting-started`` 
(in this example, it is on ``Desktop``), and create two additional directories 
within ``getting-started`` called ``data``
+and ``config``.
+
+Note that we can create a variable for the working directory called 
``WORKING_DIR``.
+
+.. code-block:: none
+
+  $ mkdir getting-started
+  $ WORKING_DIR=/Users/host1/Desktop/getting-started
+  $ cd $WORKING_DIR
+  $ mkdir getting-started/data
+  $ mkdir getting started/config
+
+We will create the transcript CSV file in ``data``, and the CSV config file in 
``config``.
+
+.. code-block:: none
+
+  $ touch getting-started/data/test.csv
+  $ touch getting-started/config/csv-record-reader-config.json
+
+The ``test.csv`` file should look like this, with no header line at the top:
+
+.. code-block:: none
+
+  200,Lucy,Smith,Female,Maths,3.8
+  200,Lucy,Smith,Female,English,3.5
+  201,Bob,King,Male,Maths,3.2
+  202,Nick,Young,Male,Physics,3.6
+
+Instead of using a header line, we will use the CSV config JSON file 
``csv-record-reader-config.json`` to specify the header:
+
+.. code-block:: none
+
+  {
+    "header":"studentID,firstName,lastName,gender,subject,score",
+    "fileFormat":"CSV"
+  }
+
+In order to set up a table, we need to specify the schema of this transcript 
in ``transcript-schema.json``, which we will store in ``config``:
+
+.. code-block:: none
+
+  $ touch getting-started/config/transcript-schema.json
+
+``transcript-schema.json`` should look like this:
 
 .. code-block:: none
 
@@ -150,15 +200,24 @@ Firstly in order to set up a table, we need to specify 
the schema of this transc
     ]
   }
 
-To upload the schema, we can use the command below:
+To upload the schema, we can navigate to the directory in 
``pinot-distribution`` that contains
+``pinot-admin.sh``, and use the command below:
 
 .. code-block:: none
 
-  $ 
./pinot-distribution/target/apache-pinot-incubating-0.1.0-SNAPSHOT-bin/apache-pinot-incubating-0.1.0-SNAPSHOT-bin/bin/pinot-admin.sh
 AddSchema -schemaFile /Users/host1/transcript-schema.json -exec
-  Executing command: AddSchema -controllerHost [controller_host] 
-controllerPort 9000 -schemaFilePath /Users/host1/transcript-schema.json -exec
-  Sending request: http://[controller_host]:9000/schemas to controller: 
[controller_host], version: 
0.1.0-SNAPSHOT-2c5d42a908213122ab0ad8b7ac9524fcf390e4cb
+  $ VERSION=0.2.0
+  $ cd 
./pinot-distribution/target/apache-pinot-incubating-$VERSION-SNAPSHOT-bin/apache-pinot-incubating-$VERSION-SNAPSHOT-bin/bin
+  $ ./pinot-admin.sh AddSchema -schemaFile 
$WORKING_DIR/config/transcript-schema.json -exec
+  Executing command: AddSchema -controllerHost [controller_host] 
-controllerPort 9000 -schemaFilePath 
/Users/host1/Desktop/getting-started/config/transcript-schema.json -exec
+  Sending request: http://[controller_host]:9000/schemas to controller: 
[controller_host], version: 
0.2.0-SNAPSHOT-68092ab9eb83af173d725ec685c22ba4eb5bacf9
 
-Then, we need to specify the table config which links the schema to this table:
+Then, we need to specify the table config in another JSON file (also stored in 
``config``), which links the schema to the table:
+
+.. code-block:: none
+
+  $ touch getting-started/config/transcript-table-config.json
+
+``transcript-table-config.json`` should look like this:
 
 .. code-block:: none
 
@@ -186,17 +245,29 @@ And upload the table config to Pinot cluster:
 
 .. code-block:: none
 
-  $ 
./pinot-distribution/target/apache-pinot-incubating-0.1.0-SNAPSHOT-bin/apache-pinot-incubating-0.1.0-SNAPSHOT-bin/bin/pinot-admin.sh
 AddTable -filePath /Users/host1/transcript-table-config.json -exec
-  Executing command: AddTable -filePath 
/Users/host1/transcript-table-config.json -controllerHost [controller_host] 
-controllerPort 9000 -exec
+  $ ./pinot-admin.sh AddTable -filePath 
$WORKING_DIR/config/transcript-table-config.json -exec
+  Executing command: AddTable -filePath 
/Users/host1/Desktop/getting-started/config/transcript-table-config.json 
-controllerHost [controller_host] -controllerPort 9000 -exec
   {"status":"Table transcript_OFFLINE successfully added"}
 
-In order to upload our data to Pinot cluster, we need to convert our CSV file 
to Pinot Segment:
+At this point, the directory tree for our ``getting-started`` should look like 
this:
+
+.. code-block:: none
+
+  |-- getting-started
+      |-- data
+             |-- test.csv
+      |-- config
+             |-- csv-record-reader-config.json
+             |-- transcript-schema.json
+             |-- transcript-table-config.json
+
+In order to upload our data to the Pinot cluster, we need to convert our CSV 
file into a Pinot Segment, which will be put in a new directory 
$WORKING_DIR/test2:
 
 .. code-block:: none
 
-  $ 
./pinot-distribution/target/apache-pinot-incubating-0.1.0-SNAPSHOT-bin/apache-pinot-incubating-0.1.0-SNAPSHOT-bin/bin/pinot-admin.sh
 CreateSegment -dataDir /Users/host1/Desktop/test/ -format CSV -outDir 
/Users/host1/Desktop/test2/ -tableName transcript -segmentName transcript_0 
-overwrite -schemaFile /Users/host1/transcript-schema.json
-  Executing command: CreateSegment  -generatorConfigFile null -dataDir 
/Users/host1/Desktop/test/ -format CSV -outDir /Users/host1/Desktop/test2/ 
-overwrite true -tableName transcript -segmentName transcript_0 -timeColumnName 
null -schemaFile /Users/host1/transcript-schema.json -readerConfigFile null 
-enableStarTreeIndex false -starTreeIndexSpecFile null -hllSize 9 -hllColumns 
null -hllSuffix _hll -numThreads 1
-  Accepted files: [/Users/host1/Desktop/test/Transcript.csv]
+  $ ./pinot-admin.sh CreateSegment -dataDir $WORKING_DIR/data -format CSV 
-outDir $WORKING_DIR/test2 -tableName transcript -segmentName transcript_0 
-overwrite -schemaFile $WORKING_DIR/config/transcript-schema.json 
-readerConfigFile $WORKING_DIR/config/csv-record-reader-config.json
+  Executing command: CreateSegment  -generatorConfigFile null -dataDir 
/Users/host1/Desktop/getting-started/data -format CSV -outDir 
/Users/host1/Desktop/getting-started/test2 -overwrite true -tableName 
transcript -segmentName transcript_0 -timeColumnName null -schemaFile 
/Users/host1/Desktop/getting-started/config/transcript-schema.json 
-readerConfigFile 
/Users/host1/Desktop/getting-started/config/csv-record-reader-config.json 
-enableStarTreeIndex false -starTreeIndexSpecFile null -hllS [...]
+  Accepted files: [file:/Users/host1/Desktop/getting-started/data/test.csv]
   Finished building StatsCollector!
   Collected stats for 4 documents
   Created dictionary for STRING column: studentID with cardinality: 1, max 
length in bytes: 4, range: null to null
@@ -208,30 +279,30 @@ In order to upload our data to Pinot cluster, we need to 
convert our CSV file to
   Start building IndexCreator!
   Finished records indexing in IndexCreator!
   Finished segment seal!
-  Converting segment: /Users/host1/Desktop/test2/transcript_0_0 to v3 format
-  v3 segment location for segment: transcript_0_0 is 
/Users/host1/Desktop/test2/transcript_0_0/v3
-  Deleting files in v1 segment directory: 
/Users/host1/Desktop/test2/transcript_0_0
+  Converting segment: 
/Users/host1/Desktop/getting-started/test2/transcript_0_0 to v3 format
+  v3 segment location for segment: transcript_0_0 is 
/Users/host1/Desktop/getting-started/test2/transcript_0_0/v3
+  Deleting files in v1 segment directory: 
/Users/host1/Desktop/getting-started/test2/transcript_0_0
   Driver, record read time : 1
   Driver, stats collector time : 0
   Driver, indexing time : 0
 
-Once we have the Pinot segment, we can upload this segment to our cluster:
+Once we have the Pinot Segment, we can upload it to our cluster:
 
 .. code-block:: none
 
-  $ 
./pinot-distribution/target/apache-pinot-incubating-0.1.0-SNAPSHOT-bin/apache-pinot-incubating-0.1.0-SNAPSHOT-bin/bin/pinot-admin.sh
 UploadSegment -segmentDir /Users/host1/Desktop/test2/
+  $ ./pinot-admin.sh UploadSegment -segmentDir $WORKING_DIR/test2/
   Executing command: UploadSegment -controllerHost [controller_host] 
-controllerPort 9000 -segmentDir /Users/host1/Desktop/test2/
   Compressing segment transcript_0_0
   Uploading segment transcript_0_0.tar.gz
-  Sending request: http://[controller_host]:9000/v2/segments to controller: 
[controller_host], version: 
0.1.0-SNAPSHOT-2c5d42a908213122ab0ad8b7ac9524fcf390e4cb
+  Sending request: http://[controller_host]:9000/v2/segments to controller: 
[controller_host], version: 
0.2.0-SNAPSHOT-68092ab9eb83af173d725ec685c22ba4eb5bacf9
 
-You made it! Now we can query the data in Pinot:
+You did it! Now we can query the data in Pinot.
 
 To get all the number of rows in the table:
 
 .. code-block:: none
 
-  $ 
./pinot-distribution/target/apache-pinot-incubating-0.1.0-SNAPSHOT-bin/apache-pinot-incubating-0.1.0-SNAPSHOT-bin/bin/pinot-admin.sh
 PostQuery -brokerPort 8000 -query "select count(*) from transcript"
+  $ ./pinot-admin.sh PostQuery -brokerPort 8000 -query "select count(*) from 
transcript"
   Executing command: PostQuery -brokerHost [controller_host] -brokerPort 8000 
-query select count(*) from transcript
   Result: 
{"aggregationResults":[{"function":"count_star","value":"4"}],"exceptions":[],"numServersQueried":1,"numServersResponded":1,"numSegmentsQueried":1,"numSegmentsProcessed":1,"numSegmentsMatched":1,"numDocsScanned":4,"numEntriesScannedInFilter":0,"numEntriesScannedPostFilter":0,"numGroupsLimitReached":false,"totalDocs":4,"timeUsedMs":7,"segmentStatistics":[],"traceInfo":{}}
 
@@ -239,7 +310,7 @@ To get the average score of subject Maths:
 
 .. code-block:: none
 
-  $ 
./pinot-distribution/target/apache-pinot-incubating-0.1.0-SNAPSHOT-bin/apache-pinot-incubating-0.1.0-SNAPSHOT-bin/bin/pinot-admin.sh
 PostQuery -brokerPort 8000 -query "select avg(score) from transcript where 
subject = \"Maths\""
+  $ ./pinot-admin.sh PostQuery -brokerPort 8000 -query "select avg(score) from 
transcript where subject = \"Maths\""
   Executing command: PostQuery -brokerHost [controller_host] -brokerPort 8000 
-query select avg(score) from transcript where subject = "Maths"
   Result: 
{"aggregationResults":[{"function":"avg_score","value":"3.50000"}],"exceptions":[],"numServersQueried":1,"numServersResponded":1,"numSegmentsQueried":1,"numSegmentsProcessed":1,"numSegmentsMatched":1,"numDocsScanned":2,"numEntriesScannedInFilter":4,"numEntriesScannedPostFilter":2,"numGroupsLimitReached":false,"totalDocs":4,"timeUsedMs":33,"segmentStatistics":[],"traceInfo":{}}
 
@@ -247,6 +318,6 @@ To get the average score for Lucy Smith:
 
 .. code-block:: none
 
-  $ 
./pinot-distribution/target/apache-pinot-incubating-0.1.0-SNAPSHOT-bin/apache-pinot-incubating-0.1.0-SNAPSHOT-bin/bin/pinot-admin.sh
 PostQuery -brokerPort 8000 -query "select avg(score) from transcript where 
firstName = \"Lucy\" and lastName = \"Smith\""
+  $ ./pinot-admin.sh PostQuery -brokerPort 8000 -query "select avg(score) from 
transcript where firstName = \"Lucy\" and lastName = \"Smith\""
   Executing command: PostQuery -brokerHost [controller_host] -brokerPort 8000 
-query select avg(score) from transcript where firstName = "Lucy" and lastName 
= "Smith"
   Result: 
{"aggregationResults":[{"function":"avg_score","value":"3.65000"}],"exceptions":[],"numServersQueried":1,"numServersResponded":1,"numSegmentsQueried":1,"numSegmentsProcessed":1,"numSegmentsMatched":1,"numDocsScanned":2,"numEntriesScannedInFilter":6,"numEntriesScannedPostFilter":2,"numGroupsLimitReached":false,"totalDocs":4,"timeUsedMs":67,"segmentStatistics":[],"traceInfo":{}}


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[incubator-pinot] 11/13: Update Getting Started documentation. (#4615)

Reply via email to