Nick Allen created METRON-2038:
----------------------------------
Summary: Enrichment Loader Fails When Run as MR Job
Key: METRON-2038
URL: https://issues.apache.org/jira/browse/METRON-2038
Project: Metron
Issue Type: Bug
Reporter: Nick Allen
Assignee: Nick Allen
The enrichment loader fails when run as an MR job on YARN. It runs successfully
when run in local mode.
The following exception occurs inside the YARN container.
{code}
2019-03-13 16:14:28,391 FATAL [main]
org.apache.hadoop.mapreduce.v2.app.MRAppMaster: Error starting MRAppMaster
java.lang.NoSuchMethodError:
org.apache.hadoop.hbase.HBaseConfiguration.createClusterConf(Lorg/apache/hadoop/conf/Configuration;Ljava/lang/String;Ljava/lang/String;)Lorg/apache/hadoop/conf/Configuration;
at
org.apache.hadoop.hbase.mapreduce.TableOutputFormat.setConf(TableOutputFormat.java:204)
at org.apache.hadoop.util.ReflectionUtils.setConf(ReflectionUtils.java:76)
at org.apache.hadoop.util.ReflectionUtils.newInstance(ReflectionUtils.java:136)
at org.apache.hadoop.mapreduce.v2.app.MRAppMaster$2.call(MRAppMaster.java:517)
at org.apache.hadoop.mapreduce.v2.app.MRAppMaster$2.call(MRAppMaster.java:501)
at
org.apache.hadoop.mapreduce.v2.app.MRAppMaster.callWithJobClassLoader(MRAppMaster.java:1640)
at
org.apache.hadoop.mapreduce.v2.app.MRAppMaster.createOutputCommitter(MRAppMaster.java:501)
at
org.apache.hadoop.mapreduce.v2.app.MRAppMaster.serviceInit(MRAppMaster.java:287)
at org.apache.hadoop.service.AbstractService.init(AbstractService.java:163)
at org.apache.hadoop.mapreduce.v2.app.MRAppMaster$5.run(MRAppMaster.java:1598)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:422)
at
org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1869)
at
org.apache.hadoop.mapreduce.v2.app.MRAppMaster.initAndStartAppMaster(MRAppMaster.java:1595)
at org.apache.hadoop.mapreduce.v2.app.MRAppMaster.main(MRAppMaster.java:1526)
2019-03-13 16:14:28,394 INFO [main] org.apache.hadoop.util.ExitUtil: Exiting
with status 1
{code}
Steps to Replicate
1. Create a data set of enrichments to load.
{code}
[root@node1 0.7.1]# cat alexa.csv
1,google.com
2,youtube.com
3,facebook.com
4,baidu.com
5,wikipedia.org
6,yahoo.com
7,google.co.in
8,reddit.com
9,qq.com
10,amazon.com
11,taobao.com
12,google.co.jp
13,twitter.com
14,tmall.com
15,vk.com
16,live.com
17,instagram.com
18,sohu.com
19,sina.com.cn
20,weibo.com
21,jd.com
22,360.cn
23,google.de
24,google.co.uk
25,google.ru
26,google.fr
27,google.com.br
28,list.tmall.com
29,linkedin.com
30,google.com.hk
31,netflix.com
32,yandex.ru
33,google.it
34,yahoo.co.jp
35,google.es
36,t.co
37,pornhub.com
38,ebay.com
39,imgur.com
40,google.com.mx
41,google.ca
42,alipay.com
43,twitch.tv
44,xvideos.com
45,bing.com
46,youth.cn
47,msn.com
48,aliexpress.com
49,tumblr.com
50,ok.ru
{code}
2. Push the data to HDFS.
{code}
hdfs dfs -put alexa.csv /tmp
{code}
3. Create the enrichment definition.
{code}
[root@node1 0.7.1]# cat enrichment.json
{
"zkQuorum":"node1:2181",
"sensorToFieldList":{
"squid":{
"type":"ENRICHMENT",
"fieldToEnrichmentTypes":{
"domain_without_subdomains":[
"whois",
"alexa"
]
}
}
}
}
{code}
4. Create the extractor definition.
{code}
[root@node1 0.7.1]# cat extractor.json
{
"config" : {
"columns" : {
"domain" : 1,
"rank" : 0
}
,"indicator_column" : "domain"
,"type" : "alexa"
,"separator" : ","
},
"extractor" : "CSV"
}
{code}
5. Execute the loader.
{code}
/usr/metron/0.7.1/bin/flatfile_loader.sh -n ./enrichment.json -t enrichment -c
t -e ./extractor.json -i /tmp/alexa.csv -m MR
19/03/13 16:12:26 WARN extractor.TransformFilterExtractorDecorator: Unable to
setup zookeeper client - zk_quorum url not provided. **This will limit some
Stellar functionality**
19/03/13 16:12:26 INFO importer.MapReduceImporter: Configuring
MapReduceImporter: /tmp/alexa.csv => enrichment:t
19/03/13 16:12:27 INFO client.RMProxy: Connecting to ResourceManager at
node1/127.0.0.1:8050
19/03/13 16:12:27 INFO client.AHSProxy: Connecting to Application History
server at node1/127.0.0.1:10200
19/03/13 16:14:09 INFO input.FileInputFormat: Total input paths to process : 1
19/03/13 16:14:10 INFO mapreduce.JobSubmitter: number of splits:1
19/03/13 16:14:11 INFO mapreduce.JobSubmitter: Submitting tokens for job:
job_1552492524533_0003
19/03/13 16:14:12 INFO impl.YarnClientImpl: Submitted application
application_1552492524533_0003
19/03/13 16:14:12 INFO mapreduce.Job: The url to track the job:
http://node1:8088/proxy/application_1552492524533_0003/
19/03/13 16:14:12 INFO mapreduce.Job: Running job: job_1552492524533_0003
19/03/13 16:14:33 INFO mapreduce.Job: Job job_1552492524533_0003 running in
uber mode : false
19/03/13 16:14:33 INFO mapreduce.Job: map 0% reduce 0%
19/03/13 16:14:33 INFO mapreduce.Job: Job job_1552492524533_0003 failed with
state FAILED due to: Application application_1552492524533_0003 failed 2 times
due to AM Container for appattempt_1552492524533_0003_000002 exited with
exitCode: 1
For more detailed output, check the application tracking page:
http://node1:8088/cluster/app/application_1552492524533_0003 Then click on
links to logs of each attempt.
Diagnostics: Exception from container-launch.
Container id: container_e01_1552492524533_0003_02_000001
Exit code: 1
{code}
6. The root cause exception is visible in the YARN logs or the application
tracker UI.
--
This message was sent by Atlassian JIRA
(v7.6.3#76005)