[ 
https://issues.apache.org/jira/browse/SQOOP-2387?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14641272#comment-14641272
 ] 

Venkat Ramachandran edited comment on SQOOP-2387 at 7/25/15 12:10 AM:
----------------------------------------------------------------------

Attaching another patch with a different approach from the first patch. 
Made sure all the unit tests pass by running 
ant tests

Sqoop cleans column (transforms special chars into _) when generating ORM class.
This works fine when the destination is HDFS (either text or avro format).

But, it fails when the destination is Hive/HCAT as the generated DDL has the 
original database column names where as the ORM/record reader will generate 
cleansed column names. This patch uses the cleansed column names while creating 
DDL for Hive/HCAT.

IMO, this way the column names are consistent either in avro or Hive/HCAT (with 
special chars replaced by _).
 


was (Author: me.venkatr):
Attaching another patch with all the unit tests pass (including Avro Import 
tests). The approach here is different from the first patch.

Sqoop applies clean column that transforms the column names when generating ORM 
class and works e2e well when the output is HDFS (text or avro).

But, it does not work when the destination is Hive/HCAT as the DDL contains the 
original database column names. This patch actually uses the cleansed column 
names while creating DDL for Hive/HCAT.

IMO, this way the column names are consistent either in avro or Hive/HCAT (with 
special chars replaced by _).
 

> NPE thrown when sqoop tries to import table with column name containing some 
> special character
> ----------------------------------------------------------------------------------------------
>
>                 Key: SQOOP-2387
>                 URL: https://issues.apache.org/jira/browse/SQOOP-2387
>             Project: Sqoop
>          Issue Type: Bug
>          Components: hive-integration
>    Affects Versions: 1.4.5, 1.4.6
>         Environment: HDP 2.2.0.0-2041
>            Reporter: Pavel Benes
>            Priority: Critical
>         Attachments: SQOOP-2387.1.patch, SQOOP-2387.2.patch, 
> SQOOP-2387.patch, joblog.txt, sqoop.log
>
>
> This sqoop import:
> {code}
> sqoop import --connect jdbc:mysql://some.merck.com:1234/dbname --username XXX 
> --password YYY --table some_table --hcatalog-database some_database 
> --hcatalog-table some_table --hive-partition-key mg_version 
> --hive-partition-value 2015-05-28-13-18 -m 1 --verbose --fetch-size 
> -2147483648
> {code}
> fails with with this error:
> {code}
> 2015-06-01 13:20:39,209 WARN [main] org.apache.hadoop.mapred.YarnChild: 
> Exception running child : java.lang.NullPointerException
>       at 
> org.apache.hive.hcatalog.data.schema.HCatSchema.get(HCatSchema.java:105)
>       at 
> org.apache.sqoop.mapreduce.hcat.SqoopHCatImportHelper.convertToHCatRecord(SqoopHCatImportHelper.java:194)
>       at 
> org.apache.sqoop.mapreduce.hcat.SqoopHCatImportMapper.map(SqoopHCatImportMapper.java:52)
>       at 
> org.apache.sqoop.mapreduce.hcat.SqoopHCatImportMapper.map(SqoopHCatImportMapper.java:34)
>       at org.apache.hadoop.mapreduce.Mapper.run(Mapper.java:145)
>       at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:784)
>       at org.apache.hadoop.mapred.MapTask.run(MapTask.java:341)
>       at org.apache.hadoop.mapred.YarnChild$2.run(YarnChild.java:163)
>       at java.security.AccessController.doPrivileged(Native Method)
>       at javax.security.auth.Subject.doAs(Subject.java:415)
>       at 
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1628)
>       at org.apache.hadoop.mapred.YarnChild.main(YarnChild.java:158)
> {code}
> It seems that the error is caused by a column name containing a hyphen ('-'). 
>  Column names are converted to java identifiers but later this converted name 
> could not be found in HCatalog schema.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to