[
https://issues.apache.org/jira/browse/SQOOP-1804?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14256634#comment-14256634
]
Veena Basavaraj commented on SQOOP-1804:
----------------------------------------
1. over optimization at this point, per sqoop installation hard to see this
will become the bottleneck
2. Counters are basically metrics that hadoop exposes, it is not metadata IMO.
{quote}
Metadata is data that describes other data. For instance, the data type, the
dat attributed such as is nullable, is sensitive fall in this category
{quote}
The output referred in this ticket is not metrics, it is infact output from the
current job execution, derived data that we want to store. It is not even
metadata.
Metadata is the hadoop world may mean other things, but I'd be not inclined to
call it metadata because other systems do it.
Having said so,
We can get other people inputs, feel free to ping others that have inputs.
Still not convince the everything is a counter or a meta data.
> Respository Structure + API: Storing/Retrieving the From/To state of the
> incremental read/ write
> ------------------------------------------------------------------------------------------------
>
> Key: SQOOP-1804
> URL: https://issues.apache.org/jira/browse/SQOOP-1804
> Project: Sqoop
> Issue Type: Sub-task
> Reporter: Veena Basavaraj
> Assignee: Veena Basavaraj
> Fix For: 1.99.5
>
>
> Details of this proposal are in the wiki.
> https://cwiki.apache.org/confluence/display/SQOOP/Delta+Fetch+And+Merge+Design#DeltaFetchAndMergeDesign-Wheretostoretheoutputinsqoop?
> Will use this ticket for more detailed discussion on storage options for the
> output from connectors
> 1.
> {code}
> // will have FK to submission
> public static final String QUERY_CREATE_TABLE_SQ_JOB_OUTPUT_SUBMISSION =
> "CREATE TABLE " + TABLE_SQ_JOB_OUTPUT + " ("
> + COLUMN_SQ_JOB_OUT_ID + " BIGINT GENERATED ALWAYS AS IDENTITY (START
> WITH 1, INCREMENT BY 1), "
> + COLUMN_SQ_JOB_OUT_KEY + " VARCHAR(32), "
> + COLUMN_SQ_JOB_OUT_VALUE + " LONG VARCHAR,"
> + COLUMN_SQ_JOB_OUT_TYPE + " VARCHAR(32),"
> + COLUMN_SQD_ID + " VARCHAR(32)," // FK to the direction table, since
> this allows to distinguish output from FROM/ TO part of the job
> + COLUMN_SQRS_SUBMISSION + " BIGINT, "
> + "CONSTRAINT " + CONSTRAINT_SQRS_SQS + " "
> + "FOREIGN KEY (" + COLUMN_SQRS_SUBMISSION + ") "
> + "REFERENCES " + TABLE_SQ_SUBMISSION + "(" + COLUMN_SQS_ID + ") ON
> DELETE CASCADE "
> {code}
> 2.
> At the code level, we will define MOutputType, one of the types can be BLOB
> as well, if a connector decides to store the value as a BLOB
> {code}
> class JobOutput {
> String key;
> Object value;
> MOutputType type;
> }
> {code}
> 3.
> At the repository API, add a new API to get job output for a particular
> submission Id and allow updates on values.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)