[jira] [Commented] (SQOOP-3068) Enhance error (tool.ImportTool: Encountered IOException running import job: java.io.IOException: Expected schema) to suggest workaround (--map-column-java)

Raghunath (JIRA) Wed, 07 Dec 2016 06:29:16 -0800

    [ 
https://issues.apache.org/jira/browse/SQOOP-3068?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15728898#comment-15728898
 ]


Raghunath commented on SQOOP-3068:
----------------------------------

Hi Trying to create issue,
Can you please let me know how to create JIRA ticket.
Sqoop Fails randomly,
While running SQOOP Export, failing as 
ORA-01438: value larger than specified precision allowed for this column
We tried splitting data and SQOOP, and could able to see row which is failing. 
Again if we try to export only failed row its successful.
We iterated up to failing row, again it's failing with different row.

> Enhance error (tool.ImportTool: Encountered IOException running import job: 
> java.io.IOException: Expected schema) to suggest workaround 
> (--map-column-java)
> -----------------------------------------------------------------------------------------------------------------------------------------------------------
>
>                 Key: SQOOP-3068
>                 URL: https://issues.apache.org/jira/browse/SQOOP-3068
>             Project: Sqoop
>          Issue Type: Improvement
>          Components: hive-integration
>    Affects Versions: 1.4.6
>            Reporter: Markus Kemper
>            Assignee: Szabolcs Vasas
>         Attachments: SQOOP-3068.patch, SQOOP-3068.patch
>
>
> Please consider enhancing the error to include more detail and suggest 
> workaround (--map-columns-java).  
> Sqoop (import + --hive-import + --as-parquetfile) can fail due to a mismatch 
> with the json schema that Hive produces vs. the json schema that Sqoop 
> generates.  The test case below demonstrates how to reproduce the issue as 
> well as workaround it.
> SETUP (create parquet table with Sqoop import and Beeline CTAS)
> {noformat}
> STEP 01 - Create MySQL Tables
> sqoop eval --connect $MYCONN --username $MYUSER --password $MYPSWD --query 
> "drop table t1"
> sqoop eval --connect $MYCONN --username $MYUSER --password $MYPSWD --query 
> "create table t1 (c_int int, c_date date, c_timestamp timestamp)"
> sqoop eval --connect $MYCONN --username $MYUSER --password $MYPSWD --query 
> "describe t1"
> ---------------------------------------------------------------------------------------------------------
> | Field                | Type                 | Null | Key | Default          
>     | Extra                | 
> ---------------------------------------------------------------------------------------------------------
> | c_int                | int(11)              | YES |     | (null)            
>    |                      | 
> | c_date               | date                 | YES |     | (null)            
>    |                      | 
> | c_timestamp          | timestamp            | NO  |     | CURRENT_TIMESTAMP 
>    | on update CURRENT_TIMESTAMP | 
> ---------------------------------------------------------------------------------------------------------
> STEP 02 : Insert and Select Row
> sqoop eval --connect $MYCONN --username $MYUSER --password $MYPSWD --query 
> "insert into t1 values (1, current_date(), current_timestamp())"
> sqoop eval --connect $MYCONN --username $MYUSER --password $MYPSWD --query 
> "select * from t1"
> --------------------------------------------------
> | c_int       | c_date     | c_timestamp         | 
> --------------------------------------------------
> | 1           | 2016-10-26 | 2016-10-26 14:30:33.0 | 
> --------------------------------------------------
> STEP 03 : Create Hive Tables
> beeline -u jdbc:hive2:// -e "use default; drop table t1"
> sqoop import -Dmapreduce.map.log.level=DEBUG --connect $MYCONN --username 
> $MYUSER --password $MYPSWD --table t1 --hcatalog-database default 
> --hcatalog-table t1_text --create-hcatalog-table --hcatalog-storage-stanza 
> 'stored as parquet' --num-mappers 1
> beeline -u jdbc:hive2:// -e "use default; create table t1 stored as parquet 
> as select * from t1_text;show create table t1;"
> +----------------------------------------------------+--+
> |                   createtab_stmt                   |
> +----------------------------------------------------+--+
> | CREATE TABLE `t1`(                                 |
> |   `c_int` int,                                     |
> |   `c_date` string,                                 |
> |   `c_timestamp` string)                            |
> | ROW FORMAT SERDE                                   |
> |   'org.apache.hadoop.hive.ql.io.parquet.serde.ParquetHiveSerDe'  |
> | STORED AS INPUTFORMAT                              |
> |   'org.apache.hadoop.hive.ql.io.parquet.MapredParquetInputFormat'  |
> | OUTPUTFORMAT                                       |
> |   'org.apache.hadoop.hive.ql.io.parquet.MapredParquetOutputFormat' |
> | LOCATION                                           |
> |   'hdfs://<namenode>:8020/user/hive/warehouse/t1' |
> | TBLPROPERTIES (                                    |
> |   'COLUMN_STATS_ACCURATE'='true',                  |
> |   'numFiles'='1',                                  |
> |   'numRows'='2',                                   |
> |   'rawDataSize'='6',                               |
> |   'totalSize'='605',                               |
> |   'transient_lastDdlTime'='1478298298')            |
> +----------------------------------------------------+--+
> {noformat}
> REPRODUCE ISSUE (import --hive-import append and overwrite)
> {noformat}
> STEP 01: Attempt --hive-import --append
>       
> sqoop import -Dmapreduce.map.log.level=DEBUG --connect $MYCONN --username 
> $MYUSER --password $MYPSWD --table t1 --hive-import --hive-database default 
> --hive-table t1 --as-parquetfile --num-mappers 1 --append
> 16/11/02 20:54:23 WARN mapreduce.DataDrivenImportJob: Target Hive table 't1' 
> exists! Sqoop will append data into the existing Hive table. Consider using 
> --hive-overwrite, if you do NOT intend to do appending.
> 16/11/02 20:54:24 ERROR tool.ImportTool: Encountered IOException running 
> import job: java.io.IOException: Expected schema: 
> {"type":"record","name":"t1","fields":[{"name":"c_int","type":["null","int"],"doc":"Converted
>  from 
> 'int'","default":null},{"name":"c_date","type":["null","string"],"doc":"Converted
>  from 
> 'string'","default":null},{"name":"c_timestamp","type":["null","string"],"doc":"Converted
>  from 'string'","default":null}]}
> Actual schema: {"type":"record","name":"t1","doc":"Sqoop import of 
> t1","fields":[{"name":"c_int","type":["null","int"],"default":null,"columnName":"c_int","sqlType":"4"},{"name":"c_date","type":["null","long"],"default":null,"columnName":"c_date","sqlType":"91"},{"name":"c_timestamp","type":["null","long"],"default":null,"columnName":"c_timestamp","sqlType":"93"}],"tableName":"t1"}
> STEP 02: Attempt --hive-import --hive-overwrite
> sqoop import -Dmapreduce.map.log.level=DEBUG --connect $MYCONN --username 
> $MYUSER --password $MYPSWD --table t1 --hive-import --hive-database default 
> --hive-table t1 --as-parquetfile --num-mappers 1 --hive-overwrite
> 16/11/02 20:56:55 INFO hive.metastore: Connected to metastore.
> 16/11/02 20:56:56 ERROR tool.ImportTool: Encountered IOException running 
> import job: java.io.IOException: Expected schema: 
> {"type":"record","name":"t1","fields":[{"name":"c_int","type":["null","int"],"doc":"Converted
>  from 
> 'int'","default":null},{"name":"c_date","type":["null","string"],"doc":"Converted
>  from 
> 'string'","default":null},{"name":"c_timestamp","type":["null","string"],"doc":"Converted
>  from 'string'","default":null}]}
> Actual schema: {"type":"record","name":"t1","doc":"Sqoop import of 
> t1","fields":[{"name":"c_int","type":["null","int"],"default":null,"columnName":"c_int","sqlType":"4"},{"name":"c_date","type":["null","long"],"default":null,"columnName":"c_date","sqlType":"91"},{"name":"c_timestamp","type":["null","long"],"default":null,"columnName":"c_timestamp","sqlType":"93"}],"tableName":"t1"}
> {noformat}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (SQOOP-3068) Enhance error (tool.ImportTool: Encountered IOException running import job: java.io.IOException: Expected schema) to suggest workaround (--map-column-java)

Reply via email to