npawar commented on a change in pull request #3927: Add doc for Customizing 
Pinot
URL: https://github.com/apache/incubator-pinot/pull/3927#discussion_r263589458
 
 

 ##########
 File path: docs/customizations.rst
 ##########
 @@ -18,5 +18,152 @@
 ..
 
 
-Customization points in Pinot
-=============================
\ No newline at end of file
+Customizing Pinot
+===================
+
+There are a lot of places in Pinot which can be customized depending on the 
infrastructure or the use case. Below is a list of such customization points. 
+
+
+.. image:: img/CustomizingPinot.png
+
+
+1. Generating Pinot segments
+^^^^^^^^^^^^^^^^^^^^^^^^^^^^
+Typically, data files will be available on some offline data storage, such as 
HDFS, and a Hadoop job can be written to read the data and create the segment. 
The `SegmentCreationJob 
<https://github.com/apache/incubator-pinot/blob/master/pinot-hadoop/src/main/java/org/apache/pinot/hadoop/job/SegmentCreationJob.java>`_
 class contains a hadoop job for creating segments. This is a map only job, and 
the mapper can be found in `SegmentCreationMapper 
<https://github.com/apache/incubator-pinot/blob/master/pinot-hadoop/src/main/java/org/apache/pinot/hadoop/job/mapper/SegmentCreationMapper.java>`_.
 You can override the SegmentCreationMapper with a custom mapper by overriding 
the SegmentCreationJob::getMapperClass() method. 
+
+New offline data is typically available in a daily or hourly frequency. You 
can schedule your jobs to run periodically using either cron or a scheduler 
such as `Azkaban <https://azkaban.github.io/>`_.    
+
+
+2. Pluggable storage
+^^^^^^^^^^^^^^^^^^^^
+We expect the storage to be shared across controllers of the same cluster, 
such as NFS. You can write your own implementation of PinotFS to store segments 
in a data layer of your choice, for example Azure or S3. Please refer to `this 
doc <https://pinot.readthedocs.io/en/latest/pluggable_storage.html>`_ for more 
details.
 
 Review comment:
   Fixed

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
[email protected]


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to