fx19880617 opened a new pull request #4742: Adding bootstrap mode for Pinot-hadoop job to output segments into relative directories. URL: https://github.com/apache/incubator-pinot/pull/4742 - Skip hidden files or temp files created by computation frameworks like hadoop, spark. - Adding a `job.bootstrap` flag to make output directory following the relative paths from input path. **job.properties** ``` input.dir = /path/to/input output.dir = /path/to/output job.bootstrap=true segment.table.name=mytable ``` The data structure under `/path/to/input` is like: ``` /path/to/input/yyyy=2019/mm=10/dd=1/part-0-r-aaa.avro /path/to/input/yyyy=2019/mm=10/dd=2/part-0-r-bbb.avro /path/to/input/yyyy=2019/mm=10/dd=3/part-0-r-ccc.avro ``` We expect the output directory structure to be: ``` /path/to/output/yyyy=2019/mm=10/dd=1/mytable_0.tar.gz /path/to/output/yyyy=2019/mm=10/dd=2/mytable_1.tar.gz /path/to/output/yyyy=2019/mm=10/dd=3/mytable_2.tar.gz ``` In the old job, we will get: ``` /path/to/output/mytable_0.tar.gz /path/to/output/mytable_1.tar.gz /path/to/output/mytable_2.tar.gz ```
---------------------------------------------------------------- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: [email protected] With regards, Apache Git Services --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
