Jackie-Jiang commented on code in PR #8812:
URL: https://github.com/apache/pinot/pull/8812#discussion_r890419704
##########
pinot-plugins/pinot-batch-ingestion/pinot-batch-ingestion-standalone/src/main/java/org/apache/pinot/plugin/ingestion/batch/standalone/SegmentGenerationJobRunner.java:
##########
@@ -168,7 +173,6 @@ public void run()
//Get list of files to process
String[] files = _inputDirFS.listFiles(_inputDirURI, true);
- //TODO: sort input files based on creation time
Review Comment:
Let's keep this TODO
##########
pinot-plugins/pinot-batch-ingestion/pinot-batch-ingestion-standalone/src/main/java/org/apache/pinot/plugin/ingestion/batch/standalone/SegmentGenerationJobRunner.java:
##########
@@ -160,6 +162,9 @@ public void init(SegmentGenerationJobSpec spec) {
LOGGER.info("Creating an executor service with {} threads(Job parallelism:
{}, available cores: {}.)", numThreads,
jobParallelism, Runtime.getRuntime().availableProcessors());
_executorService = Executors.newFixedThreadPool(numThreads);
+
+ // Set up for recording multiple failures while building segments.
Review Comment:
(minor) The comment is a little bit confusing. Suggest updating it to
reflect that we record the first failure
##########
pinot-plugins/pinot-batch-ingestion/pinot-batch-ingestion-standalone/src/main/java/org/apache/pinot/plugin/ingestion/batch/standalone/SegmentGenerationJobRunner.java:
##########
@@ -253,6 +264,15 @@ private void submitSegmentGenTask(File localTempDir, URI
inputFileURI, int seqId
taskSpec.setFailOnEmptySegment(_spec.isFailOnEmptySegment());
taskSpec.setCustomProperty(BatchConfigProperties.INPUT_DATA_FILE_URI_KEY,
inputFileURI.toString());
+ // If there's already been a failure, log and skip this file. Do this
check right before the
+ // submit to reduce odds of starting a new segment when a failure is
recorded right before the
+ // submit.
+ if (_failure.get() != null) {
+ LOGGER.info("Skipping Segment Generation Task for {} due to previous
failures", inputFileURI);
+ _segmentCreationTaskCountDownLatch.countDown();
Review Comment:
(minor) This count down is not required because the previous failure should
already drain it
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]