Veena Basavaraj created SQOOP-1989:
--------------------------------------

             Summary: IDF optimization to store object array in memory ( so 
matching is faster)
                 Key: SQOOP-1989
                 URL: https://issues.apache.org/jira/browse/SQOOP-1989
             Project: Sqoop
          Issue Type: Sub-task
            Reporter: Veena Basavaraj


according to the IDF api, we never cache or store the csv text/ not object 
array representation in memory, we only store the native format as below

{code}
**
   * Get one row of data.
   *
   * @return - One row of data, represented in the internal/native format of
   *         the intermediate data format implementation.
   */
  public T getData() {
    return data;
  }
{code}

But the matcher code in SqoopWritable always calls the setObjectData and 
getObjectData on every row in the data, which mean we exercise this call no 
matter what the native format is

        toIDF.setObjectData(matcher.getMatchingData(fromIDF.getObjectData()));

So should we not store the object array in memory?



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to