[
https://issues.apache.org/jira/browse/SQOOP-1968?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Jerry Chen updated SQOOP-1968:
------------------------------
Description:
Two performance issues found in the Matcher implementations.
1. In getMatchingData of NameMatcher, the following code block of building a
HashMap will not change across different getMatchingData calls. The HashMap
can build only once in Constructor.
{color:red}
HashMap<String,Column> colNames = new HashMap<String, Column>();
for (Column fromCol: getFromSchema().getColumnsArray()) {
colNames.put(fromCol.getName(), fromCol);
}
{color}
2. In getMatchingData of NameMatcher, indexOf of a List implementation is not
efficient. It usually involves a loop for finding the object and return the
index. To improve, we can simple store the index in the above HashMap and
retrieve the index by HashMap lookup directly
{color:red}
int fromIndex = getFromSchema().getColumnsList().indexOf(fromCol);
{color}
These performance problems are critical because getMatchingData is repeatedly
calling for each record.
was:
Two performance issues found in the Matcher implementations.
1. In getMatchingData of NameMatcher, the following code block of building a
HashMap will not change across different getMatchingData calls. The HashMap
can build only once in Constructor.
HashMap<String,Column> colNames = new HashMap<String, Column>();
for (Column fromCol: getFromSchema().getColumnsArray()) {
colNames.put(fromCol.getName(), fromCol);
}
2. In getMatchingData of NameMatcher, indexOf of a List implementation is not
efficient. It usually involves a loop for finding the object and return the
index. To improve, we can simple store the index in the above HashMap and
retrieve the index by HashMap lookup directly
int fromIndex = getFromSchema().getColumnsList().indexOf(fromCol);
These performance problems are critical because getMatchingData is repeatedly
calling for each record.
> Optimize schema operation in getMatchingData of NameMatcher
> -----------------------------------------------------------
>
> Key: SQOOP-1968
> URL: https://issues.apache.org/jira/browse/SQOOP-1968
> Project: Sqoop
> Issue Type: Improvement
> Components: connectors/generic
> Affects Versions: 2.0.0
> Reporter: Jerry Chen
>
> Two performance issues found in the Matcher implementations.
> 1. In getMatchingData of NameMatcher, the following code block of building a
> HashMap will not change across different getMatchingData calls. The HashMap
> can build only once in Constructor.
> {color:red}
> HashMap<String,Column> colNames = new HashMap<String, Column>();
> for (Column fromCol: getFromSchema().getColumnsArray()) {
> colNames.put(fromCol.getName(), fromCol);
> }
> {color}
> 2. In getMatchingData of NameMatcher, indexOf of a List implementation is
> not efficient. It usually involves a loop for finding the object and return
> the index. To improve, we can simple store the index in the above HashMap and
> retrieve the index by HashMap lookup directly
> {color:red}
> int fromIndex = getFromSchema().getColumnsList().indexOf(fromCol);
> {color}
> These performance problems are critical because getMatchingData is repeatedly
> calling for each record.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)