[jira] [Updated] (SQOOP-1799) Connector API : Ability for connector to indicate if its FROM and TO support incremental reading/ writing

Veena Basavaraj (JIRA) Thu, 15 Jan 2015 09:25:23 -0800

     [ 
https://issues.apache.org/jira/browse/SQOOP-1799?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]


Veena Basavaraj updated SQOOP-1799:
-----------------------------------
    Description: 


No longer a necessity,  If the connectors have delta read/ write configs we 
will display it and they will use those config values to do the appropriate 
form or reading from and writing to the data source. At this point having this 
in the initializer API does not seem necessary, we can revisit if we need this 
information upfront for any form of validation when the job is created.

By default it is assumed the connectors will do a full fetch and full write 
from clean slate.
For instance if the TO does not support delta records to be written is some 
fashion, but the FROM side only gave subset of records, we cannot expect delta 
append or merge ( overwriting existing records with no dupes) to happen. 

  was:
One suggestion would be have a connector's FROM/ TO initializer to expose if it 
even supports incremental. So this can be used to immediately validate the job 
creation.

{code}
 sqoop > create incremental-job -f 1 -t 2 
{code}

HDFS FROM supporting incrementation read ? Does this even apply. But surely the 
TO side should support the delta/ incremental write.

Both the from connector and to connector has to support this feature before we 
proceed. The default will be false. The Initializer API will be updated to 
support this.


{code}

import java.util.LinkedList;
import java.util.List;

import org.apache.sqoop.schema.NullSchema;
import org.apache.sqoop.schema.Schema;

/**
 * This allows connector to define initialization work for execution,
 * for example, context configuration.
 */
public abstract class Initializer<LinkConfiguration, JobConfiguration> {

  /**
   * Initialize new submission based on given configuration properties. Any
   * needed temporary values might be saved to context object and they will be
   * promoted to all other part of the workflow automatically.
   *
   * @param context Initializer context object
   * @param linkConfiguration link configuration object
   * @param jobConfiguration job configuration object for the FROM and TO
   *        In case of the FROM initializer this will represent the FROM job 
configuration
   *        In case of the TO initializer this will represent the TO job 
configuration
   */
  public abstract void initialize(InitializerContext context, LinkConfiguration 
linkConfiguration,
      JobConfiguration jobConfiguration);

  /**
   * Return list of all jars that this particular connector needs to operate on
   * following job. This method will be called after running initialize method.
   * @param context Initializer context object
   * @param linkConfiguration link configuration object
   * @param jobConfiguration job configuration object for the FROM and TO
   *        In case of the FROM initializer this will represent the FROM job 
configuration
   *        In case of the TO initializer this will represent the TO job 
configuration
   * @return
   */
  public List<String> getJars(InitializerContext context, LinkConfiguration 
linkConfiguration,
      JobConfiguration jobConfiguration) {
    return new LinkedList<String>();
  }

  /**
   * Return schema associated with the connector for FROM and TO
   * By default we assume a null schema. Override the method if there a custom 
schema to provide either for FROM or TO
   * @param context Initializer context object
   * @param linkConfiguration link configuration object
   * @param jobConfiguration job configuration object for the FROM and TO
   *        In case of the FROM initializer this will represent the FROM job 
configuration
   *        In case of the TO initializer this will represent the TO job 
configuration
   * @return
   */

  public Schema getSchema(InitializerContext context, LinkConfiguration 
linkConfiguration,
      JobConfiguration jobConfiguration) {
    return NullSchema.getInstance();
  }

{code}



> Connector API : Ability for connector to indicate if its FROM and TO support 
> incremental reading/ writing
> ---------------------------------------------------------------------------------------------------------
>
>                 Key: SQOOP-1799
>                 URL: https://issues.apache.org/jira/browse/SQOOP-1799
>             Project: Sqoop
>          Issue Type: Sub-task
>            Reporter: Veena Basavaraj
>            Assignee: Veena Basavaraj
>             Fix For: 1.99.5
>
>
> No longer a necessity,  If the connectors have delta read/ write configs we 
> will display it and they will use those config values to do the appropriate 
> form or reading from and writing to the data source. At this point having 
> this in the initializer API does not seem necessary, we can revisit if we 
> need this information upfront for any form of validation when the job is 
> created.
> By default it is assumed the connectors will do a full fetch and full write 
> from clean slate.
> For instance if the TO does not support delta records to be written is some 
> fashion, but the FROM side only gave subset of records, we cannot expect 
> delta append or merge ( overwriting existing records with no dupes) to 
> happen. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (SQOOP-1799) Connector API : Ability for connector to indicate if its FROM and TO support incremental reading/ writing

Reply via email to