Jerry Chen created SQOOP-1968:
---------------------------------
Summary: Optimize schema operation in getMatchingData of
NameMatcher
Key: SQOOP-1968
URL: https://issues.apache.org/jira/browse/SQOOP-1968
Project: Sqoop
Issue Type: Improvement
Components: connectors/generic
Affects Versions: 2.0.0
Reporter: Jerry Chen
Two performance issues found in the Matcher implementations.
1. In getMatchingData of NameMatcher, the following code block of building a
HashMap will not change across different getMatchingData calls. The HashMap
can build only once in Constructor.
HashMap<String,Column> colNames = new HashMap<String, Column>();
for (Column fromCol: getFromSchema().getColumnsArray()) {
colNames.put(fromCol.getName(), fromCol);
}
2. In getMatchingData of NameMatcher, indexOf of a List implementation is not
efficient. It usually involves a loop for finding the object and return the
index. To improve, we can simple store the index in the above HashMap and
retrieve the index by HashMap lookup directly
int fromIndex = getFromSchema().getColumnsList().indexOf(fromCol);
These performance problems are critical because getMatchingData is repeatedly
calling for each record.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)