bothra90 opened a new issue #11291:
URL: https://github.com/apache/druid/issues/11291
Unable to ingest nested json data when trying to use flattenSpec with
JSONPath `length()` function.
### Description
The ingestion process fails with the following stacktrace:
```
2021-05-22T01:56:09,614 ERROR [task-runner-0-priority-0]
org.apache.druid.indexing.common.task.IndexTask - Encountered exception in
BUILD_SEGMENTS.
java.lang.ClassCastException: java.lang.Integer cannot be cast to
com.fasterxml.jackson.databind.JsonNode
at
org.apache.druid.java.util.common.parsers.JSONFlattenerMaker.lambda$makeJsonPathExtractor$2(JSONFlattenerMaker.java:89)
~[druid-core-0.21.0-iap3.jar:0.21.0-iap3]
at
org.apache.druid.java.util.common.parsers.ObjectFlatteners$1$1.get(ObjectFlatteners.java:116)
~[druid-core-0.21.0-iap3.jar:0.21.0-iap3]
at java.util.Collections$UnmodifiableMap.get(Collections.java:1456)
~[?:1.8.0_262]
at org.apache.druid.data.input.MapBasedRow.getRaw(MapBasedRow.java:87)
~[druid-core-0.21.0-iap3.jar:0.21.0-iap3]
at
org.apache.druid.segment.incremental.IncrementalIndex.toIncrementalIndexRow(IncrementalIndex.java:544)
~[druid-processing-0.21.0-iap3.jar:0.21.0-iap3]
at
org.apache.druid.segment.incremental.IncrementalIndex.add(IncrementalIndex.java:480)
~[druid-processing-0.21.0-iap3.jar:0.21.0-iap3]
at org.apache.druid.segment.realtime.plumber.Sink.add(Sink.java:179)
~[druid-server-0.21.0-iap3.jar:0.21.0-iap3]
at
org.apache.druid.segment.realtime.appenderator.AppenderatorImpl.add(AppenderatorImpl.java:261)
~[druid-server-0.21.0-iap3.jar:0.21.0-iap3]
at
org.apache.druid.segment.realtime.appenderator.BaseAppenderatorDriver.append(BaseAppenderatorDriver.java:409)
~[druid-server-0.21.0-iap3.jar:0.21.0-iap3]
at
org.apache.druid.segment.realtime.appenderator.BatchAppenderatorDriver.add(BatchAppenderatorDriver.java:114)
~[druid-server-0.21.0-iap3.jar:0.21.0-iap3]
at
org.apache.druid.indexing.common.task.InputSourceProcessor.process(InputSourceProcessor.java:106)
~[druid-indexing-service-0.21.0-iap3.jar:0.21.0-iap3]
at
org.apache.druid.indexing.common.task.IndexTask.generateAndPublishSegments(IndexTask.java:878)
~[druid-indexing-service-0.21.0-iap3.jar:0.21.0-iap3]
at
org.apache.druid.indexing.common.task.IndexTask.runTask(IndexTask.java:494)
[druid-indexing-service-0.21.0-iap3.jar:0.21.0-iap3]
at
org.apache.druid.indexing.common.task.AbstractBatchIndexTask.run(AbstractBatchIndexTask.java:152)
[druid-indexing-service-0.21.0-iap3.jar:0.21.0-iap3]
at
org.apache.druid.indexing.common.task.batch.parallel.ParallelIndexSupervisorTask.runSequential(ParallelIndexSupervisorTask.java:964)
[druid-indexing-service-0.21.0-iap3.jar:0.21.0-iap3]
at
org.apache.druid.indexing.common.task.batch.parallel.ParallelIndexSupervisorTask.runTask(ParallelIndexSupervisorTask.java:445)
[druid-indexing-service-0.21.0-iap3.jar:0.21.0-iap3]
at
org.apache.druid.indexing.common.task.AbstractBatchIndexTask.run(AbstractBatchIndexTask.java:152)
[druid-indexing-service-0.21.0-iap3.jar:0.21.0-iap3]
at
org.apache.druid.indexing.overlord.SingleTaskBackgroundRunner$SingleTaskBackgroundRunnerCallable.call(SingleTaskBackgroundRunner.java:451)
[druid-indexing-service-0.21.0-iap3.jar:0.21.0-iap3]
at
org.apache.druid.indexing.overlord.SingleTaskBackgroundRunner$SingleTaskBackgroundRunnerCallable.call(SingleTaskBackgroundRunner.java:423)
[druid-indexing-service-0.21.0-iap3.jar:0.21.0-iap3]
at java.util.concurrent.FutureTask.run(FutureTask.java:266)
[?:1.8.0_262]
at
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
[?:1.8.0_262]
at
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
[?:1.8.0_262]
at java.lang.Thread.run(Thread.java:748) [?:1.8.0_262]
```
Input:
```
{
...
"flattenSpec": {
"fields": [
{
"type": "path",
"name": "count",
"expr": "$.team.players.length()"
}
]
}
...
}
```
Replacing json-path in flattenSpec with the following jackson-jq expression
does not hit the same problem.
```
{
...
"flattenSpec": {
"fields": [
{
"type": "jq",
"name": "count",
"expr": ".team.players | length"
}
]
}
...
}
```
We want to use json-path instead of jq since it's applicable to non-JSON
files as well.
### Affected Version
Imply version `2021.01-2 LTS`
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]