xingyc15 opened a new issue, #8641:
URL: https://github.com/apache/pinot/issues/8641
A pinot segment creation failure happened when I run a standalone script for
offline ingestion. But the problem is that, this failure didn't raise any
exception, but just print an error and mark the task as succeed. We are running
this data ingestion as an airflow task, this missing exception pretty much
delay us from debugging. Our error log is:
> [2022-04-28 00:32:33,981] {pod_launcher.py:149} INFO - Start building
IndexCreator!
[2022-04-28 00:32:36,377] {pod_launcher.py:149} INFO - Failed to generate
Pinot segment for file -
s3://deepmap-anga-production/metrics/etl_staging/pinot_ingest/map_making_metrics/date=2022-02-22/part-0-0
[2022-04-28 00:32:36,377] {pod_launcher.py:149} INFO -
shaded.com.fasterxml.jackson.core.io.JsonEOFException: Unexpected end-of-input:
was expecting closing quote for a string value
[2022-04-28 00:32:36,377] {pod_launcher.py:149} INFO - at [Source:
(String)"{"extra":"[13508528,13508529,13508604,13594110,13594112,13508467,13508479,13563695,13594105,13508475,13508489,13508494,13594107,13508483,13594109]","missed":"[6900744,6900745,6900746,6900747,6900748,6900748,6900804,6900804,6900804,6900805,6900806,6901088,6901088,6901089,6901090,6901470,6908481,6908886,6911028,6911030,7647532,7647592,8062355,8062356,8062357,8062358,8062359,8062360,8062364,8062365,8062366,8091813,8091819,8091821,8091822,8091823,8091825,8091827,8091828,8091829,8091830,8091838,80918"[truncated
1000 chars]; line: 1, column: 3001]
[2022-04-28 00:32:36,377] {pod_launcher.py:149} INFO - at
shaded.com.fasterxml.jackson.core.base.ParserMinimalBase._reportInvalidEOF(ParserMinimalBase.java:664)
~[pinot-all-0.8.0-jar-with-dependencies.jar:0.8.0-c4ceff06d21fc1c1b88469a8dbae742a4b609808]
[2022-04-28 00:32:36,377] {pod_launcher.py:149} INFO - at
shaded.com.fasterxml.jackson.core.json.ReaderBasedJsonParser._finishString2(ReaderBasedJsonParser.java:2051)
~[pinot-all-0.8.0-jar-with-dependencies.jar:0.8.0-c4ceff06d21fc1c1b88469a8dbae742a4b609808]
[2022-04-28 00:32:36,378] {pod_launcher.py:149} INFO - at
shaded.com.fasterxml.jackson.core.json.ReaderBasedJsonParser._finishString(ReaderBasedJsonParser.java:2038)
~[pinot-all-0.8.0-jar-with-dependencies.jar:0.8.0-c4ceff06d21fc1c1b88469a8dbae742a4b609808]
[2022-04-28 00:32:36,378] {pod_launcher.py:149} INFO - at
shaded.com.fasterxml.jackson.core.json.ReaderBasedJsonParser.getText(ReaderBasedJsonParser.java:293)
~[pinot-all-0.8.0-jar-with-dependencies.jar:0.8.0-c4ceff06d21fc1c1b88469a8dbae742a4b609808]
[2022-04-28 00:32:36,378] {pod_launcher.py:149} INFO - at
shaded.com.fasterxml.jackson.databind.deser.std.BaseNodeDeserializer.deserializeObject(JsonNodeDeserializer.java:267)
~[pinot-all-0.8.0-jar-with-dependencies.jar:0.8.0-c4ceff06d21fc1c1b88469a8dbae742a4b609808]
[2022-04-28 00:32:36,378] {pod_launcher.py:149} INFO - at
shaded.com.fasterxml.jackson.databind.deser.std.JsonNodeDeserializer.deserialize(JsonNodeDeserializer.java:68)
~[pinot-all-0.8.0-jar-with-dependencies.jar:0.8.0-c4ceff06d21fc1c1b88469a8dbae742a4b609808]
[2022-04-28 00:32:36,378] {pod_launcher.py:149} INFO - at
shaded.com.fasterxml.jackson.databind.deser.std.JsonNodeDeserializer.deserialize(JsonNodeDeserializer.java:15)
~[pinot-all-0.8.0-jar-with-dependencies.jar:0.8.0-c4ceff06d21fc1c1b88469a8dbae742a4b609808]
[2022-04-28 00:32:36,378] {pod_launcher.py:149} INFO - at
shaded.com.fasterxml.jackson.databind.ObjectReader._bindAsTree(ObjectReader.java:1770)
~[pinot-all-0.8.0-jar-with-dependencies.jar:0.8.0-c4ceff06d21fc1c1b88469a8dbae742a4b609808]
[2022-04-28 00:32:36,378] {pod_launcher.py:149} INFO - at
shaded.com.fasterxml.jackson.databind.ObjectReader._bindAndCloseAsTree(ObjectReader.java:1735)
~[pinot-all-0.8.0-jar-with-dependencies.jar:0.8.0-c4ceff06d21fc1c1b88469a8dbae742a4b609808]
[2022-04-28 00:32:36,378] {pod_launcher.py:149} INFO - at
shaded.com.fasterxml.jackson.databind.ObjectReader.readTree(ObjectReader.java:1422)
~[pinot-all-0.8.0-jar-with-dependencies.jar:0.8.0-c4ceff06d21fc1c1b88469a8dbae742a4b609808]
[2022-04-28 00:32:36,379] {pod_launcher.py:149} INFO - at
org.apache.pinot.spi.utils.JsonUtils.stringToJsonNode(JsonUtils.java:87)
~[pinot-all-0.8.0-jar-with-dependencies.jar:0.8.0-c4ceff06d21fc1c1b88469a8dbae742a4b609808]
[2022-04-28 00:32:36,379] {pod_launcher.py:149} INFO - at
org.apache.pinot.segment.local.segment.creator.impl.inv.json.BaseJsonIndexCreator.add(BaseJsonIndexCreator.java:92)
~[pinot-all-0.8.0-jar-with-dependencies.jar:0.8.0-c4ceff06d21fc1c1b88469a8dbae742a4b609808]
[2022-04-28 00:32:36,379] {pod_launcher.py:149} INFO - at
org.apache.pinot.segment.local.segment.creator.impl.SegmentColumnarIndexCreator.indexRow(SegmentColumnarIndexCreator.java:402)
~[pinot-all-0.8.0-jar-with-dependencies.jar:0.8.0-c4ceff06d21fc1c1b88469a8dbae742a4b609808]
[2022-04-28 00:32:36,379] {pod_launcher.py:149} INFO - at
org.apache.pinot.segment.local.segment.creator.impl.SegmentIndexCreationDriverImpl.build(SegmentIndexCreationDriverImpl.java:243)
~[pinot-all-0.8.0-jar-with-dependencies.jar:0.8.0-c4ceff06d21fc1c1b88469a8dbae742a4b609808]
[2022-04-28 00:32:36,379] {pod_launcher.py:149} INFO - at
org.apache.pinot.plugin.ingestion.batch.common.SegmentGenerationTaskRunner.run(SegmentGenerationTaskRunner.java:111)
~[pinot-all-0.8.0-jar-with-dependencies.jar:0.8.0-c4ceff06d21fc1c1b88469a8dbae742a4b609808]
[2022-04-28 00:32:36,379] {pod_launcher.py:149} INFO - at
org.apache.pinot.plugin.ingestion.batch.standalone.SegmentGenerationJobRunner.lambda$submitSegmentGenTask$1(SegmentGenerationJobRunner.java:263)
~[pinot-batch-ingestion-standalone-0.8.0-shaded.jar:0.8.0-c4ceff06d21fc1c1b88469a8dbae742a4b609808]
[2022-04-28 00:32:36,379] {pod_launcher.py:149} INFO - at
java.util.concurrent.Executors$RunnableAdapter.call(Unknown Source) [?:?]
[2022-04-28 00:32:36,379] {pod_launcher.py:149} INFO - at
java.util.concurrent.FutureTask.run(Unknown Source) [?:?]
[2022-04-28 00:32:36,380] {pod_launcher.py:149} INFO - at
java.util.concurrent.ThreadPoolExecutor.runWorker(Unknown Source) [?:?]
[2022-04-28 00:32:36,380] {pod_launcher.py:149} INFO - at
java.util.concurrent.ThreadPoolExecutor$Worker.run(Unknown Source) [?:?]
[2022-04-28 00:32:36,380] {pod_launcher.py:149} INFO - at
java.lang.Thread.run(Unknown Source) [?:?]
[2022-04-28 00:32:36,383] {pod_launcher.py:149} INFO - Trying to create
instance for class
org.apache.pinot.plugin.ingestion.batch.standalone.SegmentUriPushJobRunner
[2022-04-28 00:32:36,383] {pod_launcher.py:149} INFO - Initializing PinotFS
for scheme s3, classname org.apache.pinot.plugin.filesystem.S3PinotFS
Here is something I find in the source code
[code](https://github.com/apache/pinot/blob/1e90f141282e40f819de806920cc2a836e0e35ba/pinot-plugins/pinot-batch-ingestion/pinot-batch-ingestion-standalone/src/main/java/org/apache/pinot/plugin/ingestion/batch/standalone/SegmentGenerationJobRunner.java#L284),
I saw that this function didn't raise the exception, instead it just print an
error. Can you fix this? I suppose it should raise an error and fail the
process.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]