[jira] [Commented] (SPARK-18367) limit() makes the lame walk again

Nicholas Chammas (JIRA) Wed, 09 Nov 2016 20:11:11 -0800

    [ 
https://issues.apache.org/jira/browse/SPARK-18367?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15652968#comment-15652968
 ]


Nicholas Chammas commented on SPARK-18367:
------------------------------------------

Even if I cut the number of records I'm processing with this new code base to 
literally 10-20 records, I'm still seeing Spark use north of 6K files. This 
smells like a resource leak or a bug.

I will try to boil this down to a minimal repro that I can share.

> limit() makes the lame walk again
> ---------------------------------
>
>                 Key: SPARK-18367
>                 URL: https://issues.apache.org/jira/browse/SPARK-18367
>             Project: Spark
>          Issue Type: Bug
>          Components: SQL
>    Affects Versions: 2.0.1, 2.1.0
>         Environment: Python 3.5, Java 8
>            Reporter: Nicholas Chammas
>         Attachments: plan-with-limit.txt, plan-without-limit.txt
>
>
> I have a complex DataFrame query that fails to run normally but succeeds if I 
> add a dummy {{limit()}} upstream in the query tree.
> The failure presents itself like this:
> {code}
> ERROR DiskBlockObjectWriter: Uncaught exception while reverting partial 
> writes to file 
> /private/var/folders/f5/t48vxz555b51mr3g6jjhxv400000gq/T/blockmgr-1e908314-1d49-47ba-8c95-fa43ff43cee4/31/temp_shuffle_6b939273-c6ce-44a0-99b1-db668e6c89dc
> java.io.FileNotFoundException: 
> /private/var/folders/f5/t48vxz555b51mr3g6jjhxv400000gq/T/blockmgr-1e908314-1d49-47ba-8c95-fa43ff43cee4/31/temp_shuffle_6b939273-c6ce-44a0-99b1-db668e6c89dc
>  (Too many open files in system)
> {code}
> My {{ulimit -n}} is already set to 10,000, and I can't set it much higher on 
> macOS. However, I don't think that's the issue, since if I add a dummy 
> {{limit()}} early on the query tree -- dummy as in it does not actually 
> reduce the number of rows queried -- then the same query works.
> I've diffed the physical query plans to see what this {{limit()}} is actually 
> doing, and the diff is as follows:
> {code}
> diff plan-with-limit.txt plan-without-limit.txt
> 24,28c24
> <    :                          :     :     +- *GlobalLimit 1000000
> <    :                          :     :        +- Exchange SinglePartition
> <    :                          :     :           +- *LocalLimit 1000000
> <    :                          :     :              +- *Project [...]
> <    :                          :     :                 +- *Scan orc [...] 
> Format: ORC, InputPaths: file:/..., PartitionFilters: [], PushedFilters: [], 
> ReadSchema: struct<...
> ---
> >    :                          :     :     +- *Scan orc [...] Format: ORC, 
> > InputPaths: file:/..., PartitionFilters: [], PushedFilters: [], ReadSchema: 
> > struct<...
> 49,53c45
> <                               :     :     +- *GlobalLimit 1000000
> <                               :     :        +- Exchange SinglePartition
> <                               :     :           +- *LocalLimit 1000000
> <                               :     :              +- *Project [...]
> <                               :     :                 +- *Scan orc [...] 
> Format: ORC, InputPaths: file:/..., PartitionFilters: [], PushedFilters: [], 
> ReadSchema: struct<...
> ---
> >                               :     :     +- *Scan orc [] Format: ORC, 
> > InputPaths: file:/..., PartitionFilters: [], PushedFilters: [], ReadSchema: 
> > struct<...
> {code}
> Does this give any clues as to why this {{limit()}} is helping? Again, the 
> 1000000 limit you can see in the plan is much higher than the cardinality of 
> the dataset I'm reading, so there is no theoretical impact on the output. You 
> can see the full query plans attached to this ticket.
> Unfortunately, I don't have a minimal reproduction for this issue, but I can 
> work towards one with some clues.
> I'm seeing this behavior on 2.0.1 and on master at commit 
> {{26e1c53aceee37e3687a372ff6c6f05463fd8a94}}.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[jira] [Commented] (SPARK-18367) limit() makes the lame walk again

Reply via email to