[
https://issues.apache.org/jira/browse/SQOOP-1804?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14256634#comment-14256634
]
Veena Basavaraj edited comment on SQOOP-1804 at 12/23/14 5:52 AM:
------------------------------------------------------------------
1. over optimization at this point, per sqoop installation hard to see this
will become the bottleneck
2. Counters are basically metrics that hadoop exposes, it is not metadata IMO.
{quote}
Metadata is data that describes other data. For instance, the data type, the
dat attributed such as is nullable, is sensitive fall in this category
{quote}
The output referred in this ticket is not metrics, it is infact output from the
current job execution, derived data that we want to store. It is not even
metadata.
Metadata is the hadoop world may mean other things, but I'd be not inclined to
call it metadata because other systems do it.
Having said so,
We can get other people inputs, feel free to ping others that have inputs.
Still not convinced the everything is a counter or a meta data.
Also, I'd like to get to "A" resolution on this quickly and not stretch this
too far. We may not solve every use case, but I am convinced that ability to
have U and R operations on this "derived" data is different from counters/
metrics, that are read only.
We also have a separate ticket to address metrics such how long it took etc, I
dont think we need to make a strong separation between derived data and
metrics. Its upto the connector to choose to expose whatever it needs that
makes its next job run easier to perform.
was (Author: vybs):
1. over optimization at this point, per sqoop installation hard to see this
will become the bottleneck
2. Counters are basically metrics that hadoop exposes, it is not metadata IMO.
{quote}
Metadata is data that describes other data. For instance, the data type, the
dat attributed such as is nullable, is sensitive fall in this category
{quote}
The output referred in this ticket is not metrics, it is infact output from the
current job execution, derived data that we want to store. It is not even
metadata.
Metadata is the hadoop world may mean other things, but I'd be not inclined to
call it metadata because other systems do it.
Having said so,
We can get other people inputs, feel free to ping others that have inputs.
Still not convince the everything is a counter or a meta data.
> Respository Structure + API: Storing/Retrieving the From/To state of the
> incremental read/ write
> ------------------------------------------------------------------------------------------------
>
> Key: SQOOP-1804
> URL: https://issues.apache.org/jira/browse/SQOOP-1804
> Project: Sqoop
> Issue Type: Sub-task
> Reporter: Veena Basavaraj
> Assignee: Veena Basavaraj
> Fix For: 1.99.5
>
>
> Details of this proposal are in the wiki.
> https://cwiki.apache.org/confluence/display/SQOOP/Delta+Fetch+And+Merge+Design#DeltaFetchAndMergeDesign-Wheretostoretheoutputinsqoop?
> Will use this ticket for more detailed discussion on storage options for the
> output from connectors
> 1.
> {code}
> // will have FK to submission
> public static final String QUERY_CREATE_TABLE_SQ_JOB_OUTPUT_SUBMISSION =
> "CREATE TABLE " + TABLE_SQ_JOB_OUTPUT + " ("
> + COLUMN_SQ_JOB_OUT_ID + " BIGINT GENERATED ALWAYS AS IDENTITY (START
> WITH 1, INCREMENT BY 1), "
> + COLUMN_SQ_JOB_OUT_KEY + " VARCHAR(32), "
> + COLUMN_SQ_JOB_OUT_VALUE + " LONG VARCHAR,"
> + COLUMN_SQ_JOB_OUT_TYPE + " VARCHAR(32),"
> + COLUMN_SQD_ID + " VARCHAR(32)," // FK to the direction table, since
> this allows to distinguish output from FROM/ TO part of the job
> + COLUMN_SQRS_SUBMISSION + " BIGINT, "
> + "CONSTRAINT " + CONSTRAINT_SQRS_SQS + " "
> + "FOREIGN KEY (" + COLUMN_SQRS_SUBMISSION + ") "
> + "REFERENCES " + TABLE_SQ_SUBMISSION + "(" + COLUMN_SQS_ID + ") ON
> DELETE CASCADE "
> {code}
> 2.
> At the code level, we will define MOutputType, one of the types can be BLOB
> as well, if a connector decides to store the value as a BLOB
> {code}
> class JobOutput {
> String key;
> Object value;
> MOutputType type;
> }
> {code}
> 3.
> At the repository API, add a new API to get job output for a particular
> submission Id and allow updates on values.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)