sunithabeeram commented on a change in pull request #3927: Add doc for Customizing Pinot URL: https://github.com/apache/incubator-pinot/pull/3927#discussion_r263562852
########## File path: docs/customizations.rst ########## @@ -18,5 +18,152 @@ .. -Customization points in Pinot -============================= \ No newline at end of file +Customizing Pinot +=================== + +There are a lot of places in Pinot which can be customized depending on the infrastructure or the use case. Below is a list of such customization points. + + +.. image:: img/CustomizingPinot.png + + +1. Generating Pinot segments +^^^^^^^^^^^^^^^^^^^^^^^^^^^^ +Typically, data files will be available on some offline data storage, such as HDFS, and a Hadoop job can be written to read the data and create the segment. The `SegmentCreationJob <https://github.com/apache/incubator-pinot/blob/master/pinot-hadoop/src/main/java/org/apache/pinot/hadoop/job/SegmentCreationJob.java>`_ class contains a hadoop job for creating segments. This is a map only job, and the mapper can be found in `SegmentCreationMapper <https://github.com/apache/incubator-pinot/blob/master/pinot-hadoop/src/main/java/org/apache/pinot/hadoop/job/mapper/SegmentCreationMapper.java>`_. You can override the SegmentCreationMapper with a custom mapper by overriding the SegmentCreationJob::getMapperClass() method. + +New offline data is typically available in a daily or hourly frequency. You can schedule your jobs to run periodically using either cron or a scheduler such as `Azkaban <https://azkaban.github.io/>`_. Review comment: We cannot comment on how frequently offline data is available. You can just say that the jobs can be run daily or hourly to push offline data to Pinot. ---------------------------------------------------------------- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: [email protected] With regards, Apache Git Services --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
