Hi guys,
I am using sqoop 1.4.5 to import some data from MySQL into hive using this
command:
sqoop import --connect jdbc:mysql://some.merck.com:1234/eqtl_gtex_raw
--username XXX --password YYY --table adipose_subcutaneous --hcatalog-database
mg_user_middlegate_benesp_mysql1 --hcatalog-table adipose_subcutaneous
--hive-partition-key mg_version --hive-partition-value 2015-05-28-13-18 -m 1
--verbose --fetch-size -2147483648
and it fails with this error
2015-06-01 13:20:39,209 WARN [main] org.apache.hadoop.mapred.YarnChild:
Exception running child : java.lang.NullPointerException
at
org.apache.hive.hcatalog.data.schema.HCatSchema.get(HCatSchema.java:105)
at
org.apache.sqoop.mapreduce.hcat.SqoopHCatImportHelper.convertToHCatRecord(SqoopHCatImportHelper.java:194)
at
org.apache.sqoop.mapreduce.hcat.SqoopHCatImportMapper.map(SqoopHCatImportMapper.java:52)
at
org.apache.sqoop.mapreduce.hcat.SqoopHCatImportMapper.map(SqoopHCatImportMapper.java:34)
at org.apache.hadoop.mapreduce.Mapper.run(Mapper.java:145)
at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:784)
at org.apache.hadoop.mapred.MapTask.run(MapTask.java:341)
at org.apache.hadoop.mapred.YarnChild$2.run(YarnChild.java:163)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:415)
at
org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1628)
at org.apache.hadoop.mapred.YarnChild.main(YarnChild.java:158)
after some investigation it seems to be caused by hyphens in a table name. I
have patched sqoop jar to write more info into a log:
2015-06-01 13:15:49,337 INFO [main]
org.apache.sqoop.mapreduce.hcat.SqoopHCatImportHelper: Processing schema
fields...
2015-06-01 13:15:49,337 INFO [main]
org.apache.sqoop.mapreduce.hcat.SqoopHCatImportHelper: Adding field 'mg_version'
2015-06-01 13:15:49,337 INFO [main]
org.apache.sqoop.mapreduce.hcat.SqoopHCatImportHelper: Field count: 6
2015-06-01 13:15:49,347 INFO [main]
org.apache.sqoop.mapreduce.db.DBRecordReader: Working on split: 1=1 AND 1=1
2015-06-01 13:15:49,360 INFO [main]
org.apache.sqoop.mapreduce.db.DBRecordReader: Executing query: SELECT `SNP`,
`gene`, `beta`, ` t-stat`, `p-value` FROM `adipose_subcutaneous` AS
`adipose_subcutaneous` WHERE ( 1=1 ) AND ( 1=1 )
2015-06-01 13:15:49,657 INFO [main]
org.apache.sqoop.mapreduce.hcat.SqoopHCatImportHelper: Processing HCatRecord,
listing schema fields ...
2015-06-01 13:15:49,657 INFO [main]
org.apache.sqoop.mapreduce.hcat.SqoopHCatImportHelper: Field: snp
2015-06-01 13:15:49,663 INFO [main]
org.apache.sqoop.mapreduce.hcat.SqoopHCatImportHelper: Field: gene
2015-06-01 13:15:49,663 INFO [main]
org.apache.sqoop.mapreduce.hcat.SqoopHCatImportHelper: Field: beta
2015-06-01 13:15:49,663 INFO [main]
org.apache.sqoop.mapreduce.hcat.SqoopHCatImportHelper: Field: t-stat
2015-06-01 13:15:49,663 INFO [main]
org.apache.sqoop.mapreduce.hcat.SqoopHCatImportHelper: Field: p-value
2015-06-01 13:15:49,663 INFO [main]
org.apache.sqoop.mapreduce.hcat.SqoopHCatImportHelper: Field: mg_version
2015-06-01 13:15:49,664 INFO [main]
org.apache.sqoop.mapreduce.hcat.SqoopHCatImportHelper: Processing key: 'SNP'
2015-06-01 13:15:49,664 INFO [main]
org.apache.sqoop.mapreduce.hcat.SqoopHCatImportHelper: Processing key: 'beta'
2015-06-01 13:15:49,664 INFO [main]
org.apache.sqoop.mapreduce.hcat.SqoopHCatImportHelper: Processing key: 'gene'
2015-06-01 13:15:49,664 INFO [main]
org.apache.sqoop.mapreduce.hcat.SqoopHCatImportHelper: Processing key: 'p_value'
2015-06-01 13:20:39,209 WARN [main] org.apache.hadoop.mapred.YarnChild:
Exception running child : java.lang.NullPointerException
According to it the original DB table names are converted to lowercase and '-'
characters are replaced by sqoop. The tables without hyphens are resolved
correctly (e.g. 'SNP' -> 'snp') but the table with hyphens (i.e. 'p-value' ->
'p_value' ) is not found in a schema.
I am attaching also sqoop log and job log.
Is this a known issue and is there any workaround for it? This should be
general import/ingest so unfortunately I have no control over table names to
ingest.
Thanks,
Pavel
Notice: This e-mail message, together with any attachments, contains
information of Merck & Co., Inc. (2000 Galloping Hill Road, Kenilworth,
New Jersey, USA 07033), and/or its affiliates Direct contact information
for affiliates is available at
http://www.merck.com/contact/contacts.html) that may be confidential,
proprietary copyrighted and/or legally privileged. It is intended solely
for the use of the individual or entity named on this message. If you are
not the intended recipient, and have received this message in error,
please notify us immediately by reply e-mail and then delete it from
your system.
Log Type: syslog
Log Upload Time: Mon Jun 01 13:35:50 +0000 2015
Log Length: 6148
2015-06-01 13:15:46,891 WARN [main]
org.apache.hadoop.metrics2.impl.MetricsConfig: Cannot locate configuration:
tried hadoop-metrics2-maptask.properties,hadoop-metrics2.properties
2015-06-01 13:15:46,953 INFO [main]
org.apache.hadoop.metrics2.impl.MetricsSystemImpl: Scheduled snapshot period at
10 second(s).
2015-06-01 13:15:46,953 INFO [main]
org.apache.hadoop.metrics2.impl.MetricsSystemImpl: MapTask metrics system
started
2015-06-01 13:15:46,963 INFO [main] org.apache.hadoop.mapred.YarnChild:
Executing with tokens:
2015-06-01 13:15:46,963 INFO [main] org.apache.hadoop.mapred.YarnChild: Kind:
mapreduce.job, Service: job_1433145248836_0011, Ident:
(org.apache.hadoop.mapreduce.security.token.JobTokenIdentifier@23428b92)
2015-06-01 13:15:47,049 INFO [main] org.apache.hadoop.mapred.YarnChild:
Sleeping for 0ms before retrying again. Got null now.
2015-06-01 13:15:47,316 INFO [main] org.apache.hadoop.mapred.YarnChild:
mapreduce.cluster.local.dir for child:
/media/ephemeral0/hadoop/yarn/local/usercache/ec2-user/appcache/application_1433145248836_0011,/media/ephemeral1/hadoop/yarn/local/usercache/ec2-user/appcache/application_1433145248836_0011
2015-06-01 13:15:47,909 INFO [main]
org.apache.hadoop.conf.Configuration.deprecation: session.id is deprecated.
Instead, use dfs.metrics.session-id
2015-06-01 13:15:48,481 INFO [main]
org.apache.hadoop.conf.Configuration.deprecation: mapred.output.dir is
deprecated. Instead, use mapreduce.output.fileoutputformat.outputdir
2015-06-01 13:15:48,488 INFO [main]
org.apache.hadoop.conf.Configuration.deprecation: mapred.work.output.dir is
deprecated. Instead, use mapreduce.task.output.dir
2015-06-01 13:15:48,513 INFO [main] org.apache.hadoop.mapred.Task: Using
ResourceCalculatorProcessTree : [ ]
2015-06-01 13:15:48,978 INFO [main]
org.apache.sqoop.mapreduce.db.DBInputFormat: Using read commited transaction
isolation
2015-06-01 13:15:49,153 INFO [main] org.apache.hadoop.mapred.MapTask:
Processing split: 1=1 AND 1=1
2015-06-01 13:15:49,184 INFO [main]
org.apache.hadoop.conf.Configuration.deprecation: mapred.output.key.class is
deprecated. Instead, use mapreduce.job.output.key.class
2015-06-01 13:15:49,188 INFO [main]
org.apache.hadoop.conf.Configuration.deprecation: mapred.output.value.class is
deprecated. Instead, use mapreduce.job.output.value.class
2015-06-01 13:15:49,337 INFO [main]
org.apache.sqoop.mapreduce.hcat.SqoopHCatImportHelper: HCatalog Storer Info1 :
Handler = null
Input format class = org.apache.hadoop.hive.ql.io.orc.OrcInputFormat
Output format class = org.apache.hadoop.hive.ql.io.orc.OrcOutputFormat
Serde class = org.apache.hadoop.hive.ql.io.orc.OrcSerde
Storer properties
transient_lastDdlTime=1432909549
serialization.format=1
2015-06-01 13:15:49,337 INFO [main]
org.apache.sqoop.mapreduce.hcat.SqoopHCatImportHelper: Processing schema
fields...
2015-06-01 13:15:49,337 INFO [main]
org.apache.sqoop.mapreduce.hcat.SqoopHCatImportHelper: Adding field 'mg_version'
2015-06-01 13:15:49,337 INFO [main]
org.apache.sqoop.mapreduce.hcat.SqoopHCatImportHelper: Field count: 6
2015-06-01 13:15:49,347 INFO [main]
org.apache.sqoop.mapreduce.db.DBRecordReader: Working on split: 1=1 AND 1=1
2015-06-01 13:15:49,360 INFO [main]
org.apache.sqoop.mapreduce.db.DBRecordReader: Executing query: SELECT `SNP`,
`gene`, `beta`, ` t-stat`, `p-value` FROM `adipose_subcutaneous` AS
`adipose_subcutaneous` WHERE ( 1=1 ) AND ( 1=1 )
2015-06-01 13:15:49,657 INFO [main]
org.apache.sqoop.mapreduce.hcat.SqoopHCatImportHelper: Processing converting
HCatRecord, listing schema fields ...
2015-06-01 13:15:49,657 INFO [main]
org.apache.sqoop.mapreduce.hcat.SqoopHCatImportHelper: Field: snp
2015-06-01 13:15:49,663 INFO [main]
org.apache.sqoop.mapreduce.hcat.SqoopHCatImportHelper: Field: gene
2015-06-01 13:15:49,663 INFO [main]
org.apache.sqoop.mapreduce.hcat.SqoopHCatImportHelper: Field: beta
2015-06-01 13:15:49,663 INFO [main]
org.apache.sqoop.mapreduce.hcat.SqoopHCatImportHelper: Field: t-stat
2015-06-01 13:15:49,663 INFO [main]
org.apache.sqoop.mapreduce.hcat.SqoopHCatImportHelper: Field: p-value
2015-06-01 13:15:49,663 INFO [main]
org.apache.sqoop.mapreduce.hcat.SqoopHCatImportHelper: Field: mg_version
2015-06-01 13:15:49,664 INFO [main]
org.apache.sqoop.mapreduce.hcat.SqoopHCatImportHelper: Processing key: 'SNP'
2015-06-01 13:15:49,664 INFO [main]
org.apache.sqoop.mapreduce.hcat.SqoopHCatImportHelper: Processing key: 'beta'
2015-06-01 13:15:49,664 INFO [main]
org.apache.sqoop.mapreduce.hcat.SqoopHCatImportHelper: Processing key: 'gene'
2015-06-01 13:15:49,664 INFO [main]
org.apache.sqoop.mapreduce.hcat.SqoopHCatImportHelper: Processing key: 'p_value'
2015-06-01 13:20:39,209 WARN [main] org.apache.hadoop.mapred.YarnChild:
Exception running child : java.lang.NullPointerException
at
org.apache.hive.hcatalog.data.schema.HCatSchema.get(HCatSchema.java:105)
at
org.apache.sqoop.mapreduce.hcat.SqoopHCatImportHelper.convertToHCatRecord(SqoopHCatImportHelper.java:194)
at
org.apache.sqoop.mapreduce.hcat.SqoopHCatImportMapper.map(SqoopHCatImportMapper.java:52)
at
org.apache.sqoop.mapreduce.hcat.SqoopHCatImportMapper.map(SqoopHCatImportMapper.java:34)
at org.apache.hadoop.mapreduce.Mapper.run(Mapper.java:145)
at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:784)
at org.apache.hadoop.mapred.MapTask.run(MapTask.java:341)
at org.apache.hadoop.mapred.YarnChild$2.run(YarnChild.java:163)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:415)
at
org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1628)
at org.apache.hadoop.mapred.YarnChild.main(YarnChild.java:158)
2015-06-01 13:20:39,215 INFO [main] org.apache.hadoop.mapred.Task: Runnning
cleanup for the task
2015-06-01 13:20:39,230 INFO [main]
org.apache.hadoop.metrics2.impl.MetricsSystemImpl: Stopping MapTask metrics
system...
2015-06-01 13:20:39,231 INFO [main]
org.apache.hadoop.metrics2.impl.MetricsSystemImpl: MapTask metrics system
stopped.
2015-06-01 13:20:39,231 INFO [main]
org.apache.hadoop.metrics2.impl.MetricsSystemImpl: MapTask metrics system
shutdown complete.