npawar commented on a change in pull request #3927: Add doc for Customizing Pinot URL: https://github.com/apache/incubator-pinot/pull/3927#discussion_r263589458
########## File path: docs/customizations.rst ########## @@ -18,5 +18,152 @@ .. -Customization points in Pinot -============================= \ No newline at end of file +Customizing Pinot +=================== + +There are a lot of places in Pinot which can be customized depending on the infrastructure or the use case. Below is a list of such customization points. + + +.. image:: img/CustomizingPinot.png + + +1. Generating Pinot segments +^^^^^^^^^^^^^^^^^^^^^^^^^^^^ +Typically, data files will be available on some offline data storage, such as HDFS, and a Hadoop job can be written to read the data and create the segment. The `SegmentCreationJob <https://github.com/apache/incubator-pinot/blob/master/pinot-hadoop/src/main/java/org/apache/pinot/hadoop/job/SegmentCreationJob.java>`_ class contains a hadoop job for creating segments. This is a map only job, and the mapper can be found in `SegmentCreationMapper <https://github.com/apache/incubator-pinot/blob/master/pinot-hadoop/src/main/java/org/apache/pinot/hadoop/job/mapper/SegmentCreationMapper.java>`_. You can override the SegmentCreationMapper with a custom mapper by overriding the SegmentCreationJob::getMapperClass() method. + +New offline data is typically available in a daily or hourly frequency. You can schedule your jobs to run periodically using either cron or a scheduler such as `Azkaban <https://azkaban.github.io/>`_. + + +2. Pluggable storage +^^^^^^^^^^^^^^^^^^^^ +We expect the storage to be shared across controllers of the same cluster, such as NFS. You can write your own implementation of PinotFS to store segments in a data layer of your choice, for example Azure or S3. Please refer to `this doc <https://pinot.readthedocs.io/en/latest/pluggable_storage.html>`_ for more details. Review comment: Fixed ---------------------------------------------------------------- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: [email protected] With regards, Apache Git Services --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
