sunithabeeram commented on a change in pull request #3927: Add doc for 
Customizing Pinot
URL: https://github.com/apache/incubator-pinot/pull/3927#discussion_r263562852
 
 

 ##########
 File path: docs/customizations.rst
 ##########
 @@ -18,5 +18,152 @@
 ..
 
 
-Customization points in Pinot
-=============================
\ No newline at end of file
+Customizing Pinot
+===================
+
+There are a lot of places in Pinot which can be customized depending on the 
infrastructure or the use case. Below is a list of such customization points. 
+
+
+.. image:: img/CustomizingPinot.png
+
+
+1. Generating Pinot segments
+^^^^^^^^^^^^^^^^^^^^^^^^^^^^
+Typically, data files will be available on some offline data storage, such as 
HDFS, and a Hadoop job can be written to read the data and create the segment. 
The `SegmentCreationJob 
<https://github.com/apache/incubator-pinot/blob/master/pinot-hadoop/src/main/java/org/apache/pinot/hadoop/job/SegmentCreationJob.java>`_
 class contains a hadoop job for creating segments. This is a map only job, and 
the mapper can be found in `SegmentCreationMapper 
<https://github.com/apache/incubator-pinot/blob/master/pinot-hadoop/src/main/java/org/apache/pinot/hadoop/job/mapper/SegmentCreationMapper.java>`_.
 You can override the SegmentCreationMapper with a custom mapper by overriding 
the SegmentCreationJob::getMapperClass() method. 
+
+New offline data is typically available in a daily or hourly frequency. You 
can schedule your jobs to run periodically using either cron or a scheduler 
such as `Azkaban <https://azkaban.github.io/>`_.    
 
 Review comment:
   We cannot comment on how frequently offline data is available. You can just 
say that the jobs can be run daily or hourly to push offline data to Pinot.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
[email protected]


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to