[GitHub] [druid] arifpratama398 opened a new issue #10456: Index datasource from Hadoop 3.1.1 hdfs failed in kerberized cluster

GitBox Tue, 29 Sep 2020 23:51:39 -0700


arifpratama398 opened a new issue #10456:
URL: https://github.com/apache/druid/issues/10456



   "Index datasource from Hadoop 3.1.1 hdfs failed in kerberized cluster"
   
   ### Affected Version
   
   0.18.1
   
   ### Description
   
   I am trying to index data in HDFS to Druid but failed.
   
   Command :
   
   ```
   curl --negotiate -u:[email protected] -b /tmp/krb5cc_1008 -X 'POST' -H 
'Content-Type:application/json' -d @/home/druid/wikipedia-index-hadoop.json 
http://XXX.XXX.com:8390/druid/indexer/v1/task
   ```
   
   
   Json Spec :
   ```
   {
                  "type" : "index_hadoop",
                  "spec" : {
                    "dataSchema" : {
                      "dataSource" : "wikipedia_hadoop_29092020",
                      "parser" : {
                        "type" : "hadoopyString",
                        "parseSpec" : {
                          "format" : "json",
                          "dimensionsSpec" : {
                            "dimensions" : [
                              "channel",
                              "cityName",
                              "comment",
                              "countryIsoCode",
                              "countryName",
                              "isAnonymous",
                              "isMinor",
                              "isNew",
                              "isRobot",
                              "isUnpatrolled",
                              "metroCode",
                              "namespace",
                              "page",
                              "regionIsoCode",
                              "regionName",
                              "user",
                              { "name": "added", "type": "long" },
                              { "name": "deleted", "type": "long" },
                              { "name": "delta", "type": "long" }
                            ]
                          },
                          "timestampSpec" : {
                            "format" : "auto",
                            "column" : "time"
                          }
                        }
                      },
                      "metricsSpec" : [],
                      "granularitySpec" : {
                        "type" : "uniform",
                        "segmentGranularity" : "day",
                        "queryGranularity" : "none",
                        "intervals" : ["2015-09-12/2015-09-13"],
                        "rollup" : false
                      }
                    },
                    "ioConfig" : {
                      "type" : "hadoop",
                      "inputSpec" : {
                        "type" : "static",
                        "paths" : 
"/user/druid/quickstart/wikiticker-2015-09-12-sampled.json.gz"
                      }
                    },
                    "tuningConfig" : {
                      "type" : "hadoop",
                      "partitionsSpec" : {
                        "type" : "hashed",
                        "targetPartitionSize" : 5000000
                      },
                      "forceExtendableShardSpecs" : true,
                      "jobProperties" : {
                        "fs.default.name" : "hdfs://nn",
                        "fs.defaultFS" : "hdfs://nn/user/druid",
                        "dfs.datanode.address" : "0.0.0.0:50010",
                        "dfs.client.use.datanode.hostname" : "true",
                        "dfs.datanode.use.datanode.hostname" : "true",
                        "yarn.resourcemanager.hostname" : "xxx.xxx.com",
                        "yarn.nodemanager.vmem-check-enabled" : "false",
                        "mapreduce.map.java.opts" : "-Duser.timezone=UTC 
-Dfile.encoding=UTF-8",
                        "mapreduce.job.user.classpath.first" : "true",
                        "mapreduce.reduce.java.opts" : "-Duser.timezone=UTC 
-Dfile.encoding=UTF-8",
                        "mapreduce.map.memory.mb" : 1024,
                        "mapreduce.reduce.memory.mb" : 1024
                      }
                    }
                  },
                  "hadoopDependencyCoordinates": 
["org.apache.hadoop:hadoop-client:3.1.1"]
                }
   
   ```
   
   
   Errors while processing index in Task Log
   
   ```
   org.apache.hadoop.security.AccessControlException: Client cannot 
authenticate via:[TOKEN, KERBEROS]
   2020-09-30T03:27:20,417 WARN [task-runner-0-priority-0] 
org.apache.hadoop.ipc.Client - Exception encountered while connecting to the 
server : org.apache.hadoop.security.AccessControlException: Client cannot 
authenticate via:[TOKEN, KERBEROS]
   2020-09-30T03:27:20,531 WARN [task-runner-0-priority-0] 
org.apache.hadoop.ipc.Client - Exception encountered while connecting to the 
server : org.apache.hadoop.security.AccessControlException: Client cannot 
authenticate via:[TOKEN, KERBEROS]
   Error: 
com.google.inject.internal.Errors.checkNotNull(Ljava/lang/Object;Ljava/lang/String;)Ljava/lang/Object;
   Error: 
com.google.inject.internal.Errors.checkNotNull(Ljava/lang/Object;Ljava/lang/String;)Ljava/lang/Object;
   ```
   
   i am already set druid for hadoop kerberos cluster by set in _common
   ```
   druid.security.extensions.loadList=["druid-kerberos"]
   
druid.hadoop.security.kerberos.keytab=/etc/security/keytabs/druid.headless.keytab
   [email protected]
   ```
   
   i am also following doc from 
https://druid.apache.org/docs/0.18.1/tutorials/tutorial-kerberos-hadoop.html,copying
 hadoop configuration *-site.xml to druid conf dir but still facing same error.
   
   am I missing something ?
   
   Thanks in advance.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
[email protected]



---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[GitHub] [druid] arifpratama398 opened a new issue #10456: Index datasource from Hadoop 3.1.1 hdfs failed in kerberized cluster

Reply via email to