[
https://issues.apache.org/jira/browse/SQOOP-2010?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14276970#comment-14276970
]
Qian Xu commented on SQOOP-2010:
--------------------------------
Matching means to rearrange fields' position for every data record. I think
[~jarcec] mentioned a case that we can optimize: If FROM and TO schema are
identical, and IDF are identical, we can do
{{toIDF.setData(fromIDF.getData())}}.
> Matching is invoked on every record ( row ) we write, is not this super
> expensive?
> ----------------------------------------------------------------------------------
>
> Key: SQOOP-2010
> URL: https://issues.apache.org/jira/browse/SQOOP-2010
> Project: Sqoop
> Issue Type: Sub-task
> Reporter: Veena Basavaraj
> Fix For: 2.0.0
>
>
> the following code in SqoopMapper/ Writer invokes the matching once the data
> is got from the "Extractor", can we have a say whether or not to invoke this
> if we are sure the from/to match?
> {code}
> @Override
> public void writeArrayRecord(Object[] array) {
> fromIDF.setObjectData(array);
> writeContent();
> }
> @Override
> public void writeStringRecord(String text) {
> fromIDF.setCSVTextData(text);
> writeContent();
> }
> @Override
> public void writeRecord(Object obj) {
> fromIDF.setData(obj);
> writeContent();
> }
> private void writeContent() {
> try {
> if (LOG.isDebugEnabled()) {
> LOG.debug("Extracted data: " + fromIDF.getCSVTextData());
> }
> // NOTE: The fromIDF and the corresponding fromSchema is used only
> for the matching process
> // The output of the mappers is finally written to the toIDF object
> after the matching process
> // since the writable encapsulates the toIDF ==> new
> SqoopWritable(toIDF)
> toIDF.setObjectData(matcher.getMatchingData(fromIDF.getObjectData()));
> // NOTE: We do not use the reducer to do the writing (a.k.a LOAD in
> ETL). Hence the mapper sets up the writable
> context.write(writable, NullWritable.get());
> } catch (Exception e) {
> throw new SqoopException(MRExecutionError.MAPRED_EXEC_0013, e);
> }
> }
> {code}
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)