Nick Allen created METRON-2038:
----------------------------------

             Summary: Enrichment Loader Fails When Run as MR Job
                 Key: METRON-2038
                 URL: https://issues.apache.org/jira/browse/METRON-2038
             Project: Metron
          Issue Type: Bug
            Reporter: Nick Allen
            Assignee: Nick Allen


The enrichment loader fails when run as an MR job on YARN. It runs successfully 
when run in local mode.

The following exception occurs inside the YARN container.
{code}
2019-03-13 16:14:28,391 FATAL [main] 
org.apache.hadoop.mapreduce.v2.app.MRAppMaster: Error starting MRAppMaster
java.lang.NoSuchMethodError: 
org.apache.hadoop.hbase.HBaseConfiguration.createClusterConf(Lorg/apache/hadoop/conf/Configuration;Ljava/lang/String;Ljava/lang/String;)Lorg/apache/hadoop/conf/Configuration;
 at 
org.apache.hadoop.hbase.mapreduce.TableOutputFormat.setConf(TableOutputFormat.java:204)
 at org.apache.hadoop.util.ReflectionUtils.setConf(ReflectionUtils.java:76)
 at org.apache.hadoop.util.ReflectionUtils.newInstance(ReflectionUtils.java:136)
 at org.apache.hadoop.mapreduce.v2.app.MRAppMaster$2.call(MRAppMaster.java:517)
 at org.apache.hadoop.mapreduce.v2.app.MRAppMaster$2.call(MRAppMaster.java:501)
 at 
org.apache.hadoop.mapreduce.v2.app.MRAppMaster.callWithJobClassLoader(MRAppMaster.java:1640)
 at 
org.apache.hadoop.mapreduce.v2.app.MRAppMaster.createOutputCommitter(MRAppMaster.java:501)
 at 
org.apache.hadoop.mapreduce.v2.app.MRAppMaster.serviceInit(MRAppMaster.java:287)
 at org.apache.hadoop.service.AbstractService.init(AbstractService.java:163)
 at org.apache.hadoop.mapreduce.v2.app.MRAppMaster$5.run(MRAppMaster.java:1598)
 at java.security.AccessController.doPrivileged(Native Method)
 at javax.security.auth.Subject.doAs(Subject.java:422)
 at 
org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1869)
 at 
org.apache.hadoop.mapreduce.v2.app.MRAppMaster.initAndStartAppMaster(MRAppMaster.java:1595)
 at org.apache.hadoop.mapreduce.v2.app.MRAppMaster.main(MRAppMaster.java:1526)
2019-03-13 16:14:28,394 INFO [main] org.apache.hadoop.util.ExitUtil: Exiting 
with status 1
{code}


Steps to Replicate

1. Create a data set of enrichments to load.
{code}
[root@node1 0.7.1]# cat alexa.csv
1,google.com
2,youtube.com
3,facebook.com
4,baidu.com
5,wikipedia.org
6,yahoo.com
7,google.co.in
8,reddit.com
9,qq.com
10,amazon.com
11,taobao.com
12,google.co.jp
13,twitter.com
14,tmall.com
15,vk.com
16,live.com
17,instagram.com
18,sohu.com
19,sina.com.cn
20,weibo.com
21,jd.com
22,360.cn
23,google.de
24,google.co.uk
25,google.ru
26,google.fr
27,google.com.br
28,list.tmall.com
29,linkedin.com
30,google.com.hk
31,netflix.com
32,yandex.ru
33,google.it
34,yahoo.co.jp
35,google.es
36,t.co
37,pornhub.com
38,ebay.com
39,imgur.com
40,google.com.mx
41,google.ca
42,alipay.com
43,twitch.tv
44,xvideos.com
45,bing.com
46,youth.cn
47,msn.com
48,aliexpress.com
49,tumblr.com
50,ok.ru
{code}

2. Push the data to HDFS.
{code}
hdfs dfs -put alexa.csv /tmp
{code}

3. Create the enrichment definition.
{code}
[root@node1 0.7.1]# cat enrichment.json
{
 "zkQuorum":"node1:2181",
 "sensorToFieldList":{
 "squid":{
 "type":"ENRICHMENT",
 "fieldToEnrichmentTypes":{
 "domain_without_subdomains":[
 "whois",
 "alexa"
 ]
 }
 }
 }
}
{code}

4. Create the extractor definition.
{code}
[root@node1 0.7.1]# cat extractor.json
{
 "config" : {
 "columns" : {
 "domain" : 1,
 "rank" : 0
 }
 ,"indicator_column" : "domain"
 ,"type" : "alexa"
 ,"separator" : ","
 },
 "extractor" : "CSV"
}
{code}

5. Execute the loader.
{code}
/usr/metron/0.7.1/bin/flatfile_loader.sh -n ./enrichment.json -t enrichment -c 
t -e ./extractor.json -i /tmp/alexa.csv -m MR

19/03/13 16:12:26 WARN extractor.TransformFilterExtractorDecorator: Unable to 
setup zookeeper client - zk_quorum url not provided. **This will limit some 
Stellar functionality**
19/03/13 16:12:26 INFO importer.MapReduceImporter: Configuring 
MapReduceImporter: /tmp/alexa.csv => enrichment:t
19/03/13 16:12:27 INFO client.RMProxy: Connecting to ResourceManager at 
node1/127.0.0.1:8050
19/03/13 16:12:27 INFO client.AHSProxy: Connecting to Application History 
server at node1/127.0.0.1:10200

 

19/03/13 16:14:09 INFO input.FileInputFormat: Total input paths to process : 1
19/03/13 16:14:10 INFO mapreduce.JobSubmitter: number of splits:1
19/03/13 16:14:11 INFO mapreduce.JobSubmitter: Submitting tokens for job: 
job_1552492524533_0003
19/03/13 16:14:12 INFO impl.YarnClientImpl: Submitted application 
application_1552492524533_0003
19/03/13 16:14:12 INFO mapreduce.Job: The url to track the job: 
http://node1:8088/proxy/application_1552492524533_0003/
19/03/13 16:14:12 INFO mapreduce.Job: Running job: job_1552492524533_0003
19/03/13 16:14:33 INFO mapreduce.Job: Job job_1552492524533_0003 running in 
uber mode : false
19/03/13 16:14:33 INFO mapreduce.Job: map 0% reduce 0%
19/03/13 16:14:33 INFO mapreduce.Job: Job job_1552492524533_0003 failed with 
state FAILED due to: Application application_1552492524533_0003 failed 2 times 
due to AM Container for appattempt_1552492524533_0003_000002 exited with 
exitCode: 1
For more detailed output, check the application tracking page: 
http://node1:8088/cluster/app/application_1552492524533_0003 Then click on 
links to logs of each attempt.
Diagnostics: Exception from container-launch.
Container id: container_e01_1552492524533_0003_02_000001
Exit code: 1
{code}

6. The root cause exception is visible in the YARN logs or the application 
tracker UI.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

Reply via email to