[
https://issues.apache.org/jira/browse/SQOOP-2387?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Venkat Ramachandran updated SQOOP-2387:
---------------------------------------
Attachment: SQOOP-2387.2.patch
Attaching another patch with all the unit tests pass (including Avro Import
tests). The approach here is different from the first patch.
Sqoop applies clean column that transforms the column names when generating ORM
class and works e2e well when the output is HDFS (text or avro).
But, it does not work when the destination is Hive/HCAT as the DDL contains the
original database column names. This patch actually uses the cleansed column
names while creating DDL for Hive/HCAT.
IMO, this way the column names are consistent either in avro or Hive/HCAT (with
special chars replaced by _).
> NPE thrown when sqoop tries to import table with column name containing some
> special character
> ----------------------------------------------------------------------------------------------
>
> Key: SQOOP-2387
> URL: https://issues.apache.org/jira/browse/SQOOP-2387
> Project: Sqoop
> Issue Type: Bug
> Components: hive-integration
> Affects Versions: 1.4.5, 1.4.6
> Environment: HDP 2.2.0.0-2041
> Reporter: Pavel Benes
> Priority: Critical
> Attachments: SQOOP-2387.1.patch, SQOOP-2387.2.patch,
> SQOOP-2387.patch, joblog.txt, sqoop.log
>
>
> This sqoop import:
> {code}
> sqoop import --connect jdbc:mysql://some.merck.com:1234/dbname --username XXX
> --password YYY --table some_table --hcatalog-database some_database
> --hcatalog-table some_table --hive-partition-key mg_version
> --hive-partition-value 2015-05-28-13-18 -m 1 --verbose --fetch-size
> -2147483648
> {code}
> fails with with this error:
> {code}
> 2015-06-01 13:20:39,209 WARN [main] org.apache.hadoop.mapred.YarnChild:
> Exception running child : java.lang.NullPointerException
> at
> org.apache.hive.hcatalog.data.schema.HCatSchema.get(HCatSchema.java:105)
> at
> org.apache.sqoop.mapreduce.hcat.SqoopHCatImportHelper.convertToHCatRecord(SqoopHCatImportHelper.java:194)
> at
> org.apache.sqoop.mapreduce.hcat.SqoopHCatImportMapper.map(SqoopHCatImportMapper.java:52)
> at
> org.apache.sqoop.mapreduce.hcat.SqoopHCatImportMapper.map(SqoopHCatImportMapper.java:34)
> at org.apache.hadoop.mapreduce.Mapper.run(Mapper.java:145)
> at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:784)
> at org.apache.hadoop.mapred.MapTask.run(MapTask.java:341)
> at org.apache.hadoop.mapred.YarnChild$2.run(YarnChild.java:163)
> at java.security.AccessController.doPrivileged(Native Method)
> at javax.security.auth.Subject.doAs(Subject.java:415)
> at
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1628)
> at org.apache.hadoop.mapred.YarnChild.main(YarnChild.java:158)
> {code}
> It seems that the error is caused by a column name containing a hyphen ('-').
> Column names are converted to java identifiers but later this converted name
> could not be found in HCatalog schema.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)