Github user mengxr commented on the pull request:
https://github.com/apache/spark/pull/136#issuecomment-37734634
It is hard to say what threshold to use. I couldn't think of a use case
that requires a large window size, but I cannot say there is none.
Another possible approach is to pass all parent partitions to
SlidingRDDPartition and then retrieve the tail to append in compute(). If we
find we need to scan many partitions to assemble the tail, we send a warning
message. I'm not sure whether this would be more efficient than the current
implementation.
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at [email protected] or file a JIRA ticket
with INFRA.
---