[
https://issues.apache.org/jira/browse/SQOOP-2022?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14284055#comment-14284055
]
Veena Basavaraj edited comment on SQOOP-2022 at 1/20/15 5:36 PM:
-----------------------------------------------------------------
To solve the unsigned int/ unsigned float use case we have 3 options
1. Not look at the byteSize/ signed values in the schema and use the plain
instance of check in java.
Pros:
Will keep it simple for sqoop handling of object arrays, but we cannot validate
when CSV is given or any other native format is given, since in case of JSON,
we may not know what to store as INT vs what to store as LONG
Cons
We do not use the schema fields at all and in some cases we do not know the
type, we can run dark unless some exception is thrown somewhere down in
application code.
2. Use byteSize ( fix to use byteSize in sqoop as well instead of bitSize we
used now), and signed field. So this means something like GenericJDBC Connector
may not be able to handle the MYSQL unsigned ints, but that is ok, the
GenericJDBC Connector will be limited to only certain types, i,e it can only
handle signed ints and signed floats.
This means we also make it explicit to provide the byteSize and signed
mandatory, so the defaults dont play in, if not, we should make the "signed" =
true as a default, I am not sure again if the default should be INT or LONG, at
this point LONG is the default if byteSize is null.
Pros.
We actually make use of the schema fields as it was intended in the IDF design.
Cons
Connectors need to handle different types and set the schema byteSize and
signed field aptly. So in case of MYSQL, it needs to map the UNISIGNED INT type
to a schema of FIXEDPOINT with byteSize=8L and signed=false, so in this case
thse are treated as LONG in java.
Fact that we do not have signed/unsigned in java, this is the limitation and we
cannot handle one type in GenericJDBC Connector INTEGER to map both signed and
unsigned.
[~abec] agreed that we should go with option 2.
was (Author: vybs):
To solve the unsigned int/ unsigned float use case we have 3 options
1. Not look at the byteSize/ signed values in the schema and use the plain
instance of check in java.
Pros:
Will keep it simple for sqoop handling of object arrays, but we cannot validate
when CSV is given or any other native format is given, since in case of JSON,
we may not know what to store as INT vs what to store as LONG
Cons
We do not use the schema fields at all and in some cases we do not know the
type, we can run dark unless some exception is thrown somewhere down in
application code.
2. Use byteSize ( fix to use byteSize in sqoop as well instead of bitSize we
used now), and signed field. So this means something like GenericJDBC Connector
may not be able to handle the MYSQL unsigned ints, but that is ok, the
GenericJDBC Connector will be limited to only certain types, i,e it can only
handle signed ints and signed floats.
Pros.
We actually make use of the schema fields as it was intended in the IDF design.
Cons
Connectors need to handle different types and set the schema byteSize and
signed field aptly. So in case of MYSQL, it needs to map the UNISIGNED INT type
to a schema of FIXEDPOINT with byteSize=8L and signed=false, so in this case
thse are treated as LONG in java.
Fact that we do not have signed/unsigned in java, this is the limitation and we
cannot handle one type in GenericJDBC Connector INTEGER to map both signed and
unsigned.
[~abec] agreed that we should go with option 2.
> Sqoop2: SqoopIDFUtils uses bit size instead of byteSize to check for
> INT/LONG/FLOAT/Double
> ------------------------------------------------------------------------------------------
>
> Key: SQOOP-2022
> URL: https://issues.apache.org/jira/browse/SQOOP-2022
> Project: Sqoop
> Issue Type: Bug
> Reporter: Veena Basavaraj
> Assignee: Veena Basavaraj
> Fix For: 1.99.5
>
> Attachments: SQOOP-2022-v1.patch, SQOOP-2022.patch
>
>
> From SQOOP-2018, [~stanleyxu2005] found a good issue we have where we use BIT
> SIZE instead of BYTE size. we have to fix this.
> also see https://issues.apache.org/jira/browse/SQOOP-2023 to get more context
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)