[ 
https://issues.apache.org/jira/browse/SQOOP-2022?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14284055#comment-14284055
 ] 

Veena Basavaraj edited comment on SQOOP-2022 at 1/20/15 5:23 PM:
-----------------------------------------------------------------

To solve the unsigned int/ unsigned float use case we have 3 options

1. Not look at the byteSize/ signed values in the schema and use the plain 
instance of check in java. 

Pros:
Will keep it simple for sqoop handling of object arrays, but we cannot validate 
when CSV is given or any other native format is given, since in case of JSON, 
we may not know what to store as INT vs what to store as LONG
Cons
We do not use the schema fields at all and in some cases we do not know the 
type, we can run dark unless some exception is thrown somewhere down in 
application code.

2. Use byteSize ( fix to use byteSize in sqoop as well instead of bitSize we 
used now), and signed field. So this means something like GenericJDBC Connector 
may not be able to handle the MYSQL unsigned ints, but that is ok, the 
GenericJDBC Connector will be limited to only certain types, i,e it can only 
handle signed ints and signed floats.

Pros.
We actually make use of the schema fields as it was intended in the IDF design. 
Cons
Connectors need to handle different types and set the schema byteSize and 
signed field aptly. So in case of MYSQL, it needs to map the UNISIGNED INT type 
to a schema of FIXEDPOINT with byteSize=8L and signed=false, so in this case 
thse are treated as LONG in java.

Fact that we do not have signed/unsigned in java, this is the limitation and we 
cannot handle one type in GenericJDBC Connector INTEGER to map both signed and 
unsigned.

[~abec] agreed that we should go with option 2.


was (Author: vybs):
To solve the unsigned int/ unsigned float use case we have 3 options

1. Not look at the byteSize/ signed values in the schema and use the plain 
instance of check in java. 

Pros:
Will keep it simple for sqoop handling of object arrays, but we cannot validate 
when CSV is given or any other native format is given, since in case of JSON, 
we may not know what to store as INT vs what to store as LONG
Cons
We do not use the schema fields at all and in some cases we do not know the 
type, we can run dark unless some exception is thrown somewhere down in 
application code.

2. Use byteSize ( fix to use byteSize in sqoop as well instead of bitSize we 
used now), and signed field. So this means something like JDBC Connector may 
not be able to handle the MYSQL unsigned ints, but that is ok, the JDBC 
Connector will be limited to only certain types.

Pros.
We actually make use of the schema fields as it was intended in the IDF design. 
Cons
Connectors need to handle different types and set the schema byteSize and 
signed field aptly. So in case of MYSQL, it needs to map the UNISIGNED INT type 
to a schema of FIXEDPOINT with byteSize=4L and signed=false. 

[~abec] agreed that we should go with option 2.

> Sqoop2: SqoopIDFUtils uses bit size instead of byteSize to check for 
> INT/LONG/FLOAT/Double
> ------------------------------------------------------------------------------------------
>
>                 Key: SQOOP-2022
>                 URL: https://issues.apache.org/jira/browse/SQOOP-2022
>             Project: Sqoop
>          Issue Type: Bug
>            Reporter: Veena Basavaraj
>            Assignee: Veena Basavaraj
>             Fix For: 1.99.5
>
>         Attachments: SQOOP-2022-v1.patch, SQOOP-2022.patch
>
>
> From SQOOP-2018, [~stanleyxu2005] found a good issue we have where we use BIT 
> SIZE instead of BYTE size. we have to fix this.
> also see https://issues.apache.org/jira/browse/SQOOP-2023 to get more context



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to