[
https://issues.apache.org/jira/browse/SPARK-4902?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Guoqiang Li updated SPARK-4902:
-------------------------------
Description:
{{CacheManager.getOrCompute}} returns an instance of InterruptibleIterator that
contains an array or a iterator(when the memory is not enough).
The GapSamplingIterator implementation is as follows
{code}
private val iterDrop: Int => Unit = {
val arrayClass = Array.empty[T].iterator.getClass
val arrayBufferClass = ArrayBuffer.empty[T].iterator.getClass
data.getClass match {
case `arrayClass` => ((n: Int) => { data = data.drop(n) })
case `arrayBufferClass` => ((n: Int) => { data = data.drop(n) })
case _ => ((n: Int) => {
var j = 0
while (j < n && data.hasNext) {
data.next()
j += 1
}
})
}
}
{code}
The code does not deal with InterruptibleIterator.
This leads to the following code can't use the {{Iterator.drop}} method
{code}
rdd.cache()
rdd.sample(false,0.1)
{code}
was:
{{CacheManager.getOrCompute}} returns an instance of InterruptibleIterator that
contains an array or a iterator(when the memory is not enough).
The GapSamplingIterator implementation is as follows
{code}
private val iterDrop: Int => Unit = {
val arrayClass = Array.empty[T].iterator.getClass
val arrayBufferClass = ArrayBuffer.empty[T].iterator.getClass
data.getClass match {
case `arrayClass` => ((n: Int) => { data = data.drop(n) })
case `arrayBufferClass` => ((n: Int) => { data = data.drop(n) })
case _ => ((n: Int) => {
var j = 0
while (j < n && data.hasNext) {
data.next()
j += 1
}
})
}
}
{code}
The code does not deal with InterruptibleIterator.
This leads to the following code can't use the {{Iterator.drop}} method
{code}
rdd.cache()
data.sample(false,0.1)
{code}
> gap-sampling performance optimization
> -------------------------------------
>
> Key: SPARK-4902
> URL: https://issues.apache.org/jira/browse/SPARK-4902
> Project: Spark
> Issue Type: Improvement
> Components: Spark Core
> Affects Versions: 1.2.0
> Reporter: Guoqiang Li
>
> {{CacheManager.getOrCompute}} returns an instance of InterruptibleIterator
> that contains an array or a iterator(when the memory is not enough).
> The GapSamplingIterator implementation is as follows
> {code}
> private val iterDrop: Int => Unit = {
> val arrayClass = Array.empty[T].iterator.getClass
> val arrayBufferClass = ArrayBuffer.empty[T].iterator.getClass
> data.getClass match {
> case `arrayClass` => ((n: Int) => { data = data.drop(n) })
> case `arrayBufferClass` => ((n: Int) => { data = data.drop(n) })
> case _ => ((n: Int) => {
> var j = 0
> while (j < n && data.hasNext) {
> data.next()
> j += 1
> }
> })
> }
> }
> {code}
> The code does not deal with InterruptibleIterator.
> This leads to the following code can't use the {{Iterator.drop}} method
> {code}
> rdd.cache()
> rdd.sample(false,0.1)
> {code}
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]