[jira] [Updated] (SQOOP-3439) Sqoop export - All mappers are launched on the same node manager

Mate Juhasz (JIRA) Wed, 15 May 2019 06:48:26 -0700


     [ 
https://issues.apache.org/jira/browse/SQOOP-3439?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]


Mate Juhasz updated SQOOP-3439:
-------------------------------
    Description: 
Found an interesting behaviour during Sqoop export to Oracle database, but its 
not only Oracle related. 

The debug level Yarn application logs show that the Application Master requests 
the containers correctly across more nodes, so the resource requests are 
correct.
But the Resource Manager for some weird reason allocates the containers on only 
one node. MR AM is requesting for only NODE_LOCAL containers on all hosts.

The blocks in hdfs fsck output show that they are spread across all datanodes, 
but there are 200 blocks which are grouped into 12 map splits by the 
inputformat class SqoopHCatExportFormat .

The issue seems to be because of the logic in SqoopHCatInputSplit#getLocations, 
which does a union of all the grouped split block locations.

For example, in our case when 200 splits are grouped into 12, each grouped 
split will contain on average 16 splits. Even if only 1/16 splits has a block 
located in a particular datanode such hdpadn0007.analytics.cccis.com , then it 
will request a NODE_LOCAL container on that node. As a result all 12 splits are 
considered NODE_LOCAL to almost every datanode in the cluster, which leads to 
the issue.

{code}
  @Override
  public String[] getLocations() throws IOException, InterruptedException {
    if (this.hCatLocations == null) {
      Set<String> locations = new HashSet<String>();
      for (HCatSplit split : this.hCatSplits) {
        locations.addAll(Arrays.asList(split.getLocations()));
      }
      this.hCatLocations = locations.toArray(new String[0]);
    }
    return this.hCatLocations;
  }
{code}

Tried with --direct option where I guess the OraOopDBInputSplit shall be used 
instead, but it results in the same:

{code}
  @Override
  public String[] getLocations() throws IOException {
    if (this.splitLocation.isEmpty()) {
      return new String[] {};
    } else {
      return new String[] { this.splitLocation };
    }
  }
{code}


  was:
Found an interesting behaviour during Sqoop export to Oracle database, but its 
not only Oracle related. 

The debug level Yarn application logs show that the Application Master requests 
the containers correctly across more nodes, so the resource requests are 
correct.
But the Resource Manager for some weird reason allocates the containers on only 
one node. MR AM is requesting for only NODE_LOCAL containers on all hosts.

The blocks in hdfs fsck output show that they are spread across all datanodes, 
but there are 200 blocks which are grouped into 12 map splits by the 
inputformat class SqoopHCatExportFormat .

The issue seems to be because of the logic in SqoopHCatInputSplit#getLocations, 
which does a union of all the grouped split block locations.

{code}
  @Override
  public String[] getLocations() throws IOException, InterruptedException {
    if (this.hCatLocations == null) {
      Set<String> locations = new HashSet<String>();
      for (HCatSplit split : this.hCatSplits) {
        locations.addAll(Arrays.asList(split.getLocations()));
      }
      this.hCatLocations = locations.toArray(new String[0]);
    }
    return this.hCatLocations;
  }
{code}

Tried with --direct option where I guess the OraOopDBInputSplit shall be used 
instead, but it results in the same:

{code}
  @Override
  public String[] getLocations() throws IOException {
    if (this.splitLocation.isEmpty()) {
      return new String[] {};
    } else {
      return new String[] { this.splitLocation };
    }
  }
{code}



> Sqoop export - All mappers are launched on the same node manager
> ----------------------------------------------------------------
>
>                 Key: SQOOP-3439
>                 URL: https://issues.apache.org/jira/browse/SQOOP-3439
>             Project: Sqoop
>          Issue Type: Bug
>    Affects Versions: 1.4.7
>            Reporter: Mate Juhasz
>            Priority: Major
>
> Found an interesting behaviour during Sqoop export to Oracle database, but 
> its not only Oracle related. 
> The debug level Yarn application logs show that the Application Master 
> requests the containers correctly across more nodes, so the resource requests 
> are correct.
> But the Resource Manager for some weird reason allocates the containers on 
> only one node. MR AM is requesting for only NODE_LOCAL containers on all 
> hosts.
> The blocks in hdfs fsck output show that they are spread across all 
> datanodes, but there are 200 blocks which are grouped into 12 map splits by 
> the inputformat class SqoopHCatExportFormat .
> The issue seems to be because of the logic in 
> SqoopHCatInputSplit#getLocations, which does a union of all the grouped split 
> block locations.
> For example, in our case when 200 splits are grouped into 12, each grouped 
> split will contain on average 16 splits. Even if only 1/16 splits has a block 
> located in a particular datanode such hdpadn0007.analytics.cccis.com , then 
> it will request a NODE_LOCAL container on that node. As a result all 12 
> splits are considered NODE_LOCAL to almost every datanode in the cluster, 
> which leads to the issue.
> {code}
>   @Override
>   public String[] getLocations() throws IOException, InterruptedException {
>     if (this.hCatLocations == null) {
>       Set<String> locations = new HashSet<String>();
>       for (HCatSplit split : this.hCatSplits) {
>         locations.addAll(Arrays.asList(split.getLocations()));
>       }
>       this.hCatLocations = locations.toArray(new String[0]);
>     }
>     return this.hCatLocations;
>   }
> {code}
> Tried with --direct option where I guess the OraOopDBInputSplit shall be used 
> instead, but it results in the same:
> {code}
>   @Override
>   public String[] getLocations() throws IOException {
>     if (this.splitLocation.isEmpty()) {
>       return new String[] {};
>     } else {
>       return new String[] { this.splitLocation };
>     }
>   }
> {code}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Updated] (SQOOP-3439) Sqoop export - All mappers are launched on the same node manager

Reply via email to