[ 
https://issues.apache.org/jira/browse/SPARK-11392?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15003184#comment-15003184
 ] 

Nakul Jindal commented on SPARK-11392:
--------------------------------------

Sorry, it's been a while since I last worked on this. 

[~yhuai] - After looking at the code, I am not entirely clear on what you mean 
when you say
{quote}
If we call GroupedIterator's hasNext immediately after its next, we will 
generate an extra group (CoGroupedIterator has this behavior).
{quote}

The title however makes sense to me - about {{hasNext}} not being idempotent. 
Per my understanding {{hasNext}} in iterators should not be modifying the 
underlying iterator in general, but it does for GroupedIterator. 

I can think of two things we can do to make {{hasNext}} idempotent, both of 
which are less than ideal:

* Eagerly evaluate the GroupedIterator - This is probably not what we want to 
do. 
* Do the work done in {{fetchNextGroupIterator}} twice, specifically this loop: 
[L118-L120|https://github.com/apache/spark/blob/14d08b99085d4e609aeae0cf54d4584e860eb552/sql/core/src/main/scala/org/apache/spark/sql/execution/GroupedIterator.scala#L118-L120]
{code}
while (input.hasNext && keyOrdering.compare(currentGroup, currentRow) == 0) {
    currentRow = input.next()
}
{code}
Once for {{hasNext}} and one for {{next}}. This obviously introduces some 
inefficiency.

*Thoughts?*


> GroupedIterator's hasNext is not idempotent
> -------------------------------------------
>
>                 Key: SPARK-11392
>                 URL: https://issues.apache.org/jira/browse/SPARK-11392
>             Project: Spark
>          Issue Type: Sub-task
>          Components: SQL
>            Reporter: Yin Huai
>
> If we call 
> [GroupedIterator|https://github.com/apache/spark/blob/master/sql/core/src/main/scala/org/apache/spark/sql/execution/GroupedIterator.scala]'s
>  {{hasNext}} immediately after its {{next}}, we will generate an extra group 
> ([CoGroupedIterator|https://github.com/apache/spark/blob/master/sql/core/src/main/scala/org/apache/spark/sql/execution/CoGroupedIterator.scala]
>  has this behavior). 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to