[ 
https://issues.apache.org/jira/browse/DRILL-5617?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16068494#comment-16068494
 ] 

Andries Engelbrecht commented on DRILL-5617:
--------------------------------------------

Perhaps proper configuration will avoid this issue.

On most Hadoop Distros with HDFS there is local temp location for mapreduce 
that should be leveraged for Drill spill. Placing spill data on general HDFS 
will cause replication that can slow things down.

As example on MapR there are local volumes with replication 1 that can be used, 
in this case it won't overlap between nodes. See this link for configuration.
https://community.mapr.com/community/exchange/blog/2017/05/03/top-5-items-to-configure-with-drill-on-mapr-5x

Similar best practices should be leveraged for other deployments.

> Spill file name collisions when spill file is on a shared file system
> ---------------------------------------------------------------------
>
>                 Key: DRILL-5617
>                 URL: https://issues.apache.org/jira/browse/DRILL-5617
>             Project: Apache Drill
>          Issue Type: Bug
>          Components: Functions - Drill
>    Affects Versions: 1.11.0
>            Reporter: Chun Chang
>            Assignee: Paul Rogers
>
> Spill location can be configured to be written on hdfs such as:
>   hashagg: {
>     # The partitions divide the work inside the hashagg, to ease
>     # handling spilling. This initial figure is tuned down when
>     # memory is limited.
>     #  Setting this option to 1 disables spilling !
>     num_partitions: 32,
>     spill: {
>         # The 2 options below override the common ones
>         # they should be deprecated in the future
>         directories : [ "/tmp/drill/spill" ],
>         fs : "maprfs:///"
>      }
>   }
> However, this could cause spill filename conflict since name convention does 
> not contain node name.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

Reply via email to