Hello, I'd like to count more than Int.MaxValue. But I encountered the following error.
scala> val rdd = sc.parallelize(1L to Int.MaxValue*2.toLong) rdd: org.apache.spark.rdd.RDD[Long] = ParallelCollectionRDD[28] at parallelize at <console>:24 scala> rdd.count java.lang.IllegalArgumentException: More than Int.MaxValue elements. at scala.collection.immutable.NumericRange$.check$1(NumericRange.scala:304) at scala.collection.immutable.NumericRange$.count(NumericRange.scala:314) at scala.collection.immutable.NumericRange.numRangeElements$lzycompute(NumericRange.scala:52) at scala.collection.immutable.NumericRange.numRangeElements(NumericRange.scala:51) at scala.collection.immutable.NumericRange.length(NumericRange.scala:54) at org.apache.spark.rdd.ParallelCollectionRDD$.slice(ParallelCollectionRDD.scala:145) at org.apache.spark.rdd.ParallelCollectionRDD.getPartitions(ParallelCollectionRDD.scala:97) at org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:252) at org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:250) at scala.Option.getOrElse(Option.scala:121) at org.apache.spark.rdd.RDD.partitions(RDD.scala:250) at org.apache.spark.SparkContext.runJob(SparkContext.scala:2087) at org.apache.spark.rdd.RDD.count(RDD.scala:1158) ... 48 elided How can I avoid the error ? A similar problem is as follows: scala> rdd.reduce((a,b)=> (a + b)) java.lang.IllegalArgumentException: More than Int.MaxValue elements. at scala.collection.immutable.NumericRange$.check$1(NumericRange.scala:304) at scala.collection.immutable.NumericRange$.count(NumericRange.scala:314) at scala.collection.immutable.NumericRange.numRangeElements$lzycompute(NumericRange.scala:52) at scala.collection.immutable.NumericRange.numRangeElements(NumericRange.scala:51) at scala.collection.immutable.NumericRange.length(NumericRange.scala:54) at org.apache.spark.rdd.ParallelCollectionRDD$.slice(ParallelCollectionRDD.scala:145) at org.apache.spark.rdd.ParallelCollectionRDD.getPartitions(ParallelCollectionRDD.scala:97) at org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:252) at org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:250) at scala.Option.getOrElse(Option.scala:121) at org.apache.spark.rdd.RDD.partitions(RDD.scala:250) at org.apache.spark.SparkContext.runJob(SparkContext.scala:2119) at org.apache.spark.rdd.RDD$$anonfun$reduce$1.apply(RDD.scala:1026) at org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:151) at org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:112) at org.apache.spark.rdd.RDD.withScope(RDD.scala:362) at org.apache.spark.rdd.RDD.reduce(RDD.scala:1008) ... 48 elided