bossjie opened a new issue #7562:
URL: https://github.com/apache/pinot/issues/7562
target:
use pinot repo example spark ingestion job to push data from hdfs to pinot,
targeting on ingest data of path/2014/01/02/xxx.avro
env:
pinot 0.8 build with java1.8
what I did:
In sparkIngestionJobSpec_1.yaml, I tries below input and find it doesn't
work as expected.
inputDirURI:
'hdfs://namenode.com:8020/user/chxing/pinot/airlineStats/rawdata/2014/01/02'
includeFileNamePattern: 'glob:**.avro'
**--get multiple days in target Pinot table while under 01/02, the file only
contains one day's data**
inputDirURI:
'hdfs://namenode.com:8020/user/chxing/pinot/airlineStats/rawdata/2014/01/02'
includeFileNamePattern: 'glob:*.avro'
or
inputDirURI:
'hdfs://namenode.com:8020/user/chxing/pinot/airlineStats/rawdata/2014/01/02/'
includeFileNamePattern: 'glob:*.avro'
**exception as below**
command.LaunchDataIngestionJobCommand: Got exception to kick off standalone
data ingestion job -
java.lang.RuntimeException: Caught exception during running -
org.apache.pinot.plugin.ingestion.batch.spark.SparkSegmentGenerationJobRunner
at
org.apache.pinot.spi.ingestion.batch.IngestionJobLauncher.kickoffIngestionJob(IngestionJobLauncher.java:144)
at
org.apache.pinot.spi.ingestion.batch.IngestionJobLauncher.runIngestionJob(IngestionJobLauncher.java:113)
at
org.apache.pinot.tools.admin.command.LaunchDataIngestionJobCommand.execute(LaunchDataIngestionJobCommand.java:132)
at
org.apache.pinot.tools.admin.command.LaunchDataIngestionJobCommand.main(LaunchDataIngestionJobCommand.java:67)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
at
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:498)
at
org.apache.spark.deploy.yarn.ApplicationMaster$$anon$2.run(ApplicationMaster.scala:673)
Caused by: java.lang.IllegalArgumentException: Positive number of partitions
required
at
org.apache.spark.rdd.ParallelCollectionRDD$.slice(ParallelCollectionRDD.scala:119)
at
org.apache.spark.rdd.ParallelCollectionRDD.getPartitions(ParallelCollectionRDD.scala:97)
at org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:253)
at org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:251)
at scala.Option.getOrElse(Option.scala:121)
at org.apache.spark.rdd.RDD.partitions(RDD.scala:251)
at org.apache.spark.SparkContext.runJob(SparkContext.scala:2146)
at org.apache.spark.rdd.RDD$$anonfun$foreach$1.apply(RDD.scala:927)
at org.apache.spark.rdd.RDD$$anonfun$foreach$1.apply(RDD.scala:925)
at
org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:151)
at
org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:112)
at org.apache.spark.rdd.RDD.withScope(RDD.scala:363)
at org.apache.spark.rdd.RDD.foreach(RDD.scala:925)
at
org.apache.spark.api.java.JavaRDDLike$class.foreach(JavaRDDLike.scala:351)
at
org.apache.spark.api.java.AbstractJavaRDDLike.foreach(JavaRDDLike.scala:45)
at
org.apache.pinot.plugin.ingestion.batch.spark.SparkSegmentGenerationJobRunner.run(SparkSegmentGenerationJobRunner.java:245)
at
org.apache.pinot.spi.ingestion.batch.IngestionJobLauncher.kickoffIngestionJob(IngestionJobLauncher.java:142)
... 8 more
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]