[jira] [Updated] (SQOOP-1968) Optimize schema operation in getMatchingData of NameMatcher

Jerry Chen (JIRA) Mon, 05 Jan 2015 00:26:36 -0800

     [ 
https://issues.apache.org/jira/browse/SQOOP-1968?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]


Jerry Chen updated SQOOP-1968:
------------------------------
    Description: 
Two performance issues found in the Matcher implementations.

1. In getMatchingData  of NameMatcher, the following code block of building a 
HashMap will not change across different getMatchingData  calls. The HashMap 
can build only once in Constructor.
{color:red}
    HashMap<String,Column> colNames = new HashMap<String, Column>();

    for (Column fromCol: getFromSchema().getColumnsArray()) {
      colNames.put(fromCol.getName(), fromCol);
    }
{color}
2. In getMatchingData  of NameMatcher, indexOf of a List implementation is not 
efficient. It usually involves a loop for finding the object and return the 
index. To improve, we can simple store the index in the above HashMap and 
retrieve the index by HashMap lookup directly

{color:red}
int fromIndex = getFromSchema().getColumnsList().indexOf(fromCol);
{color}
These performance problems are critical because getMatchingData is repeatedly 
calling for each record.

  was:
Two performance issues found in the Matcher implementations.

1. In getMatchingData  of NameMatcher, the following code block of building a 
HashMap will not change across different getMatchingData  calls. The HashMap 
can build only once in Constructor.

    HashMap<String,Column> colNames = new HashMap<String, Column>();

    for (Column fromCol: getFromSchema().getColumnsArray()) {
      colNames.put(fromCol.getName(), fromCol);
    }

2. In getMatchingData  of NameMatcher, indexOf of a List implementation is not 
efficient. It usually involves a loop for finding the object and return the 
index. To improve, we can simple store the index in the above HashMap and 
retrieve the index by HashMap lookup directly

int fromIndex = getFromSchema().getColumnsList().indexOf(fromCol);

These performance problems are critical because getMatchingData is repeatedly 
calling for each record.


> Optimize schema operation in getMatchingData of NameMatcher
> -----------------------------------------------------------
>
>                 Key: SQOOP-1968
>                 URL: https://issues.apache.org/jira/browse/SQOOP-1968
>             Project: Sqoop
>          Issue Type: Improvement
>          Components: connectors/generic
>    Affects Versions: 2.0.0
>            Reporter: Jerry Chen
>
> Two performance issues found in the Matcher implementations.
> 1. In getMatchingData  of NameMatcher, the following code block of building a 
> HashMap will not change across different getMatchingData  calls. The HashMap 
> can build only once in Constructor.
> {color:red}
>     HashMap<String,Column> colNames = new HashMap<String, Column>();
>     for (Column fromCol: getFromSchema().getColumnsArray()) {
>       colNames.put(fromCol.getName(), fromCol);
>     }
> {color}
> 2. In getMatchingData  of NameMatcher, indexOf of a List implementation is 
> not efficient. It usually involves a loop for finding the object and return 
> the index. To improve, we can simple store the index in the above HashMap and 
> retrieve the index by HashMap lookup directly
> {color:red}
> int fromIndex = getFromSchema().getColumnsList().indexOf(fromCol);
> {color}
> These performance problems are critical because getMatchingData is repeatedly 
> calling for each record.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (SQOOP-1968) Optimize schema operation in getMatchingData of NameMatcher

Reply via email to