Hello,
I'd like to count more than Int.MaxValue. But I encountered the following
error.

scala> val rdd = sc.parallelize(1L to Int.MaxValue*2.toLong)
rdd: org.apache.spark.rdd.RDD[Long] = ParallelCollectionRDD[28] at
parallelize at <console>:24

scala> rdd.count
java.lang.IllegalArgumentException: More than Int.MaxValue elements.
  at
scala.collection.immutable.NumericRange$.check$1(NumericRange.scala:304)
  at scala.collection.immutable.NumericRange$.count(NumericRange.scala:314)
  at
scala.collection.immutable.NumericRange.numRangeElements$lzycompute(NumericRange.scala:52)
  at
scala.collection.immutable.NumericRange.numRangeElements(NumericRange.scala:51)
  at scala.collection.immutable.NumericRange.length(NumericRange.scala:54)
  at
org.apache.spark.rdd.ParallelCollectionRDD$.slice(ParallelCollectionRDD.scala:145)
  at
org.apache.spark.rdd.ParallelCollectionRDD.getPartitions(ParallelCollectionRDD.scala:97)
  at org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:252)
  at org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:250)
  at scala.Option.getOrElse(Option.scala:121)
  at org.apache.spark.rdd.RDD.partitions(RDD.scala:250)
  at org.apache.spark.SparkContext.runJob(SparkContext.scala:2087)
  at org.apache.spark.rdd.RDD.count(RDD.scala:1158)
  ... 48 elided

How can I avoid the error ?
A similar problem is as follows:
scala> rdd.reduce((a,b)=> (a + b))
java.lang.IllegalArgumentException: More than Int.MaxValue elements.
  at
scala.collection.immutable.NumericRange$.check$1(NumericRange.scala:304)
  at scala.collection.immutable.NumericRange$.count(NumericRange.scala:314)
  at
scala.collection.immutable.NumericRange.numRangeElements$lzycompute(NumericRange.scala:52)
  at
scala.collection.immutable.NumericRange.numRangeElements(NumericRange.scala:51)
  at scala.collection.immutable.NumericRange.length(NumericRange.scala:54)
  at
org.apache.spark.rdd.ParallelCollectionRDD$.slice(ParallelCollectionRDD.scala:145)
  at
org.apache.spark.rdd.ParallelCollectionRDD.getPartitions(ParallelCollectionRDD.scala:97)
  at org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:252)
  at org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:250)
  at scala.Option.getOrElse(Option.scala:121)
  at org.apache.spark.rdd.RDD.partitions(RDD.scala:250)
  at org.apache.spark.SparkContext.runJob(SparkContext.scala:2119)
  at org.apache.spark.rdd.RDD$$anonfun$reduce$1.apply(RDD.scala:1026)
  at
org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:151)
  at
org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:112)
  at org.apache.spark.rdd.RDD.withScope(RDD.scala:362)
  at org.apache.spark.rdd.RDD.reduce(RDD.scala:1008)
  ... 48 elided

Reply via email to