Jerry Chen created SQOOP-1968:
---------------------------------

             Summary: Optimize schema operation in getMatchingData of 
NameMatcher
                 Key: SQOOP-1968
                 URL: https://issues.apache.org/jira/browse/SQOOP-1968
             Project: Sqoop
          Issue Type: Improvement
          Components: connectors/generic
    Affects Versions: 2.0.0
            Reporter: Jerry Chen


Two performance issues found in the Matcher implementations.

1. In getMatchingData  of NameMatcher, the following code block of building a 
HashMap will not change across different getMatchingData  calls. The HashMap 
can build only once in Constructor.

    HashMap<String,Column> colNames = new HashMap<String, Column>();

    for (Column fromCol: getFromSchema().getColumnsArray()) {
      colNames.put(fromCol.getName(), fromCol);
    }

2. In getMatchingData  of NameMatcher, indexOf of a List implementation is not 
efficient. It usually involves a loop for finding the object and return the 
index. To improve, we can simple store the index in the above HashMap and 
retrieve the index by HashMap lookup directly

int fromIndex = getFromSchema().getColumnsList().indexOf(fromCol);

These performance problems are critical because getMatchingData is repeatedly 
calling for each record.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to