[GitHub] [pinot] bossjie opened a new issue #7562: Pinot spark ingestion glob input uri does not work as expected

GitBox Mon, 11 Oct 2021 23:27:06 -0700


bossjie opened a new issue #7562:
URL: https://github.com/apache/pinot/issues/7562



   target: 
   use pinot repo example spark ingestion job to push data from hdfs to pinot, 
targeting on ingest data of  path/2014/01/02/xxx.avro 
   
   env: 
   pinot 0.8 build with java1.8
   
   what I did:
   In sparkIngestionJobSpec_1.yaml, I tries below input and find it doesn't 
work as expected.
   
   inputDirURI: 
'hdfs://namenode.com:8020/user/chxing/pinot/airlineStats/rawdata/2014/01/02'
   includeFileNamePattern: 'glob:**.avro'
   **--get multiple days in target Pinot table while under 01/02, the file only 
contains one day's data**
   
   inputDirURI: 
'hdfs://namenode.com:8020/user/chxing/pinot/airlineStats/rawdata/2014/01/02'
   includeFileNamePattern: 'glob:*.avro'
   or
   inputDirURI: 
'hdfs://namenode.com:8020/user/chxing/pinot/airlineStats/rawdata/2014/01/02/'
   includeFileNamePattern: 'glob:*.avro'
   
   **exception as below**
   command.LaunchDataIngestionJobCommand: Got exception to kick off standalone 
data ingestion job - 
   java.lang.RuntimeException: Caught exception during running - 
org.apache.pinot.plugin.ingestion.batch.spark.SparkSegmentGenerationJobRunner
        at 
org.apache.pinot.spi.ingestion.batch.IngestionJobLauncher.kickoffIngestionJob(IngestionJobLauncher.java:144)
        at 
org.apache.pinot.spi.ingestion.batch.IngestionJobLauncher.runIngestionJob(IngestionJobLauncher.java:113)
        at 
org.apache.pinot.tools.admin.command.LaunchDataIngestionJobCommand.execute(LaunchDataIngestionJobCommand.java:132)
        at 
org.apache.pinot.tools.admin.command.LaunchDataIngestionJobCommand.main(LaunchDataIngestionJobCommand.java:67)
        at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
        at 
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
        at 
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
        at java.lang.reflect.Method.invoke(Method.java:498)
        at 
org.apache.spark.deploy.yarn.ApplicationMaster$$anon$2.run(ApplicationMaster.scala:673)
   Caused by: java.lang.IllegalArgumentException: Positive number of partitions 
required
        at 
org.apache.spark.rdd.ParallelCollectionRDD$.slice(ParallelCollectionRDD.scala:119)
        at 
org.apache.spark.rdd.ParallelCollectionRDD.getPartitions(ParallelCollectionRDD.scala:97)
        at org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:253)
        at org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:251)
        at scala.Option.getOrElse(Option.scala:121)
        at org.apache.spark.rdd.RDD.partitions(RDD.scala:251)
        at org.apache.spark.SparkContext.runJob(SparkContext.scala:2146)
        at org.apache.spark.rdd.RDD$$anonfun$foreach$1.apply(RDD.scala:927)
        at org.apache.spark.rdd.RDD$$anonfun$foreach$1.apply(RDD.scala:925)
        at 
org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:151)
        at 
org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:112)
        at org.apache.spark.rdd.RDD.withScope(RDD.scala:363)
        at org.apache.spark.rdd.RDD.foreach(RDD.scala:925)
        at 
org.apache.spark.api.java.JavaRDDLike$class.foreach(JavaRDDLike.scala:351)
        at 
org.apache.spark.api.java.AbstractJavaRDDLike.foreach(JavaRDDLike.scala:45)
        at 
org.apache.pinot.plugin.ingestion.batch.spark.SparkSegmentGenerationJobRunner.run(SparkSegmentGenerationJobRunner.java:245)
        at 
org.apache.pinot.spi.ingestion.batch.IngestionJobLauncher.kickoffIngestionJob(IngestionJobLauncher.java:142)
        ... 8 more
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]



---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[GitHub] [pinot] bossjie opened a new issue #7562: Pinot spark ingestion glob input uri does not work as expected

Reply via email to