[
https://issues.apache.org/jira/browse/SQOOP-1549?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Veena Basavaraj updated SQOOP-1549:
-----------------------------------
Description:
Here is what happens today ( SQOOP-1367 ) when someone needs to write a
connector.
First they start looking at the connector api and sees that they need to
implement configuration classes. Well after some thinking they realize, they
need 3 classes. Why they wonder? But they continue on and implement 3 classes.
In some cases there is really nothing for Link Configuration, but they still
have to create this dummy class for a Configuration Class and then another
dummy one for config class, which if it were me would find it absurd.
Then after creating 3 configuration classes, they need to then create atleast 3
config classes. Note the use of word atleast. The api is not at all obvious in
telling them that they infact can create more than 3 config classes. It seems
like a hidden feature unless until someone sees some sample code where there is
more than one config class per configuration class. !!
The naming "getJobConfigurationClass" tells them nothing. You may say javadoc
could explain it, But I wonder why we need to even support 3 configuration
classes and more than 3 config classes.
{code}
/**
* @return Get link configuration class
*/
public abstract Class getLinkConfigurationClass();
/**
* @return Get job configuration group per direction type or null if not
supported
*/
public abstract Class getJobConfigurationClass(Direction jobType);
{code}
Here is my proposal ( if at all you want to support groups of configs, they
atleast name the class to "ConfiguratioGroup"
Here is how the apis makes it obvious, that this class can contain a group of
link configs
{code}
/**
* @return Get link configuration group class
*/
public abstract Class getLinkConfigurationGroupClass();
/**
* @return Get job configuration group class per direction type or null if
not supported
*/
public abstract Class getJobConfigurationGroupClass(Direction jobType);
{code}
[~abec] seems to need some validation from the group on why it should be called
"Group". I have explained my reasoning for this change in
https://reviews.apache.org/r/26295/
Alternatively I think the current design/ implementation to support config
parameters grouping is overkill ( over designed)
I prefer simple apis, less things for a developer to code and intuitive names
to everything they represent
1. Remove the ConfigList and support grouping of configs by the "group"
attribute on inputs
2. Have one configuration class annotation that will mandate 3 classes with
specific annotations attributes on it FromConfig, ToConfig and LinkConfig to be
filled.
So having one class, gives a complete picture of all configs this connector
uses/ provides. There is one resource bundle we require, so it maps to one
configuration class as well.
In code this is how it will look
{code}
*/
@ConfigurationGroupClass
public class HdfsCongifuration {
@Config public LinkConfig linkConfig;
@Config public FromJobConfig fromJobConfig;
@Config public ToJobConfig toJobConfig;
....
}
{code}
{code}
*/
@ConfigClass(validators = {@Validator(ToJobConfig.ConfigValidator.class)})
public class ToJobConfig {
@Input(size = 50, group="foo") public String schemaName;
@Input(size = 2000, group="bar") public String tableName;
@Input(size = 50) public String sql;
@Input(size = 50) public String columns;
@Input(size = 2000) public String stageTableName;
@Input public Boolean clearStageTable;
}
{code}
was:
Here is what happens today ( SQOOP-1367 ) when someone needs to write a
connector.
First they start looking at the connector api and sees that they need to
implement configuration classes. Well after some thinking they realize, they
need 3 classes. Why they wonder? But they continue on and implement 3 classes.
In some cases there is really nothing for Link Configuration, but they still
have to create this dummy class for a Configuration Class and then another
dummy one for config class, which if it were me would find it absurd.
Then after creating 3 configuration classes, they need to then create atleast 3
config classes. Note the use of word atleast. The api is not at all obvious in
telling them that they infact can create more than 3 config classes. It seems
like a hidden feature unless until someone sees some sample code where there is
more than one config class per configuration class. !!
The naming "getJobConfigurationClass" tells them nothing. You may say javadoc
could explain it, But I wonder why we need to even support 3 configuration
classes and more than 3 config classes.
{code}
/**
* @return Get link configuration class
*/
public abstract Class getLinkConfigurationClass();
/**
* @return Get job configuration group per direction type or null if not
supported
*/
public abstract Class getJobConfigurationClass(Direction jobType);
{code}
Here is my proposal ( if at all you want to support groups of configs, they
atleast name the class to "ConfiguratioGroup"
Here is how the apis makes it obvious, that this class can contain a group of
link configs
{code}
/**
* @return Get link configuration group class
*/
public abstract Class getLinkConfigurationGroupClass();
/**
* @return Get job configuration group class per direction type or null if
not supported
*/
public abstract Class getJobConfigurationGroupClass(Direction jobType);
{code}
[~abec] seems to need some validation from the group on why it should be called
"Group". I have explained my reasoning for this change in
https://reviews.apache.org/r/26295/
Alternatively I think the current design/ implementation to support config
parameters grouping is overkill ( over designed)
I prefer simple apis, less things for a developer to code and intuitive names
to everything they represent
1. Remove the ConfigList and support grouping of configs by the "group"
attribute on inputs
2. Have one configuration class annotation that will mandate 3 classes with
specific annotations attributes on it FromConfig, ToConfig and LinkConfig to be
filled.
So having one class, gives a complete picture of all configs this connector
uses/ provides. There is one resource bundle we require, so it maps to one
configuration class as well.
> Simplifying the Configuration class concept in Connector api
> ------------------------------------------------------------
>
> Key: SQOOP-1549
> URL: https://issues.apache.org/jira/browse/SQOOP-1549
> Project: Sqoop
> Issue Type: Sub-task
> Reporter: Veena Basavaraj
> Assignee: Veena Basavaraj
>
> Here is what happens today ( SQOOP-1367 ) when someone needs to write a
> connector.
> First they start looking at the connector api and sees that they need to
> implement configuration classes. Well after some thinking they realize, they
> need 3 classes. Why they wonder? But they continue on and implement 3
> classes. In some cases there is really nothing for Link Configuration, but
> they still have to create this dummy class for a Configuration Class and then
> another dummy one for config class, which if it were me would find it absurd.
> Then after creating 3 configuration classes, they need to then create atleast
> 3 config classes. Note the use of word atleast. The api is not at all
> obvious in telling them that they infact can create more than 3 config
> classes. It seems like a hidden feature unless until someone sees some sample
> code where there is more than one config class per configuration class. !!
> The naming "getJobConfigurationClass" tells them nothing. You may say javadoc
> could explain it, But I wonder why we need to even support 3 configuration
> classes and more than 3 config classes.
> {code}
> /**
> * @return Get link configuration class
> */
> public abstract Class getLinkConfigurationClass();
> /**
> * @return Get job configuration group per direction type or null if not
> supported
> */
> public abstract Class getJobConfigurationClass(Direction jobType);
> {code}
> Here is my proposal ( if at all you want to support groups of configs, they
> atleast name the class to "ConfiguratioGroup"
> Here is how the apis makes it obvious, that this class can contain a group of
> link configs
> {code}
> /**
> * @return Get link configuration group class
> */
> public abstract Class getLinkConfigurationGroupClass();
> /**
> * @return Get job configuration group class per direction type or null if
> not supported
> */
> public abstract Class getJobConfigurationGroupClass(Direction jobType);
> {code}
> [~abec] seems to need some validation from the group on why it should be
> called "Group". I have explained my reasoning for this change in
> https://reviews.apache.org/r/26295/
> Alternatively I think the current design/ implementation to support config
> parameters grouping is overkill ( over designed)
> I prefer simple apis, less things for a developer to code and intuitive names
> to everything they represent
> 1. Remove the ConfigList and support grouping of configs by the "group"
> attribute on inputs
> 2. Have one configuration class annotation that will mandate 3 classes with
> specific annotations attributes on it FromConfig, ToConfig and LinkConfig to
> be filled.
> So having one class, gives a complete picture of all configs this connector
> uses/ provides. There is one resource bundle we require, so it maps to one
> configuration class as well.
> In code this is how it will look
> {code}
> */
> @ConfigurationGroupClass
> public class HdfsCongifuration {
> @Config public LinkConfig linkConfig;
> @Config public FromJobConfig fromJobConfig;
> @Config public ToJobConfig toJobConfig;
>
> ....
> }
> {code}
> {code}
> */
> @ConfigClass(validators = {@Validator(ToJobConfig.ConfigValidator.class)})
> public class ToJobConfig {
> @Input(size = 50, group="foo") public String schemaName;
> @Input(size = 2000, group="bar") public String tableName;
> @Input(size = 50) public String sql;
> @Input(size = 50) public String columns;
> @Input(size = 2000) public String stageTableName;
> @Input public Boolean clearStageTable;
>
> }
> {code}
>
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)