[
https://issues.apache.org/jira/browse/SPARK-11392?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15003184#comment-15003184
]
Nakul Jindal commented on SPARK-11392:
--------------------------------------
Sorry, it's been a while since I last worked on this.
[~yhuai] - After looking at the code, I am not entirely clear on what you mean
when you say
{quote}
If we call GroupedIterator's hasNext immediately after its next, we will
generate an extra group (CoGroupedIterator has this behavior).
{quote}
The title however makes sense to me - about {{hasNext}} not being idempotent.
Per my understanding {{hasNext}} in iterators should not be modifying the
underlying iterator in general, but it does for GroupedIterator.
I can think of two things we can do to make {{hasNext}} idempotent, both of
which are less than ideal:
* Eagerly evaluate the GroupedIterator - This is probably not what we want to
do.
* Do the work done in {{fetchNextGroupIterator}} twice, specifically this loop:
[L118-L120|https://github.com/apache/spark/blob/14d08b99085d4e609aeae0cf54d4584e860eb552/sql/core/src/main/scala/org/apache/spark/sql/execution/GroupedIterator.scala#L118-L120]
{code}
while (input.hasNext && keyOrdering.compare(currentGroup, currentRow) == 0) {
currentRow = input.next()
}
{code}
Once for {{hasNext}} and one for {{next}}. This obviously introduces some
inefficiency.
*Thoughts?*
> GroupedIterator's hasNext is not idempotent
> -------------------------------------------
>
> Key: SPARK-11392
> URL: https://issues.apache.org/jira/browse/SPARK-11392
> Project: Spark
> Issue Type: Sub-task
> Components: SQL
> Reporter: Yin Huai
>
> If we call
> [GroupedIterator|https://github.com/apache/spark/blob/master/sql/core/src/main/scala/org/apache/spark/sql/execution/GroupedIterator.scala]'s
> {{hasNext}} immediately after its {{next}}, we will generate an extra group
> ([CoGroupedIterator|https://github.com/apache/spark/blob/master/sql/core/src/main/scala/org/apache/spark/sql/execution/CoGroupedIterator.scala]
> has this behavior).
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]