Hey Sadhan,

Sorry for my previous abrupt reply. Submitting a MR job is definitely wrong here, I'm investigating. Would you mind to provide the Spark/Hive/Hadoop versions you are using? If you're using most recent master branch, a concrete commit sha1 would be very helpful.

Thanks!
Cheng


On 11/12/14 12:34 AM, Sadhan Sood wrote:
Hi Cheng,

I made sure the only hive server running on the machine is hivethriftserver2.

/usr/lib/jvm/default-java/bin/java -cp /usr/lib/hadoop/lib/hadoop-lzo.jar::/mnt/sadhan/spark-3/sbin/../conf:/mnt/sadhan/spark-3/spark-assembly-1.2.0-SNAPSHOT-hadoop2.3.0-cdh5.0.2.jar:/etc/hadoop/conf -Xms512m -Xmx512m org.apache.spark.deploy.SparkSubmit --class org.apache.spark.sql.hive.thriftserver.HiveThriftServer2 --master yarn --jars reporting.jar spark-internal

The query I am running is a simple count(*): "select count(*) from Xyz where date_prefix=20141031" and pretty sure it's submitting a map reduce job based on the spark logs:

TakesRest=false

Total jobs = 1

Launching Job 1 out of 1

Number of reduce tasks determined at compile time: 1

In order to change the average load for a reducer (in bytes):

  set hive.exec.reducers.bytes.per.reducer=<number>

In order to limit the maximum number of reducers:

  set hive.exec.reducers.max=<number>

In order to set a constant number of reducers:

  set mapreduce.job.reduces=<number>

14/11/11 16:23:17 INFO ql.Context: New scratch dir is hdfs://fdsfdsfsdfsdf:9000/tmp/hive-ubuntu/hive_2014-11-11_16-23-17_333_5669798325805509526-2

Starting Job = job_1414084656759_0142, Tracking URL = http://xxxxxxx:8100/proxy/application_1414084656759_0142/ <http://t.signauxdix.com/e1t/c/5/f18dQhb0S7lC8dDMPbW2n0x6l2B9nMJW7t5XYg2zGvG-W8rBGxP1p8d-TW64zBkx56dS1Dd58vwq02?t=http%3A%2F%2Fec2-54-83-34-89.compute-1.amazonaws.com%3A8100%2Fproxy%2Fapplication_1414084656759_0142%2F&si=6222577584832512&pi=626685a9-b628-43cc-91a1-93636171ce77>

Kill Command = /usr/lib/hadoop/bin/hadoop job -kill job_1414084656759_0142


On Mon, Nov 10, 2014 at 9:59 PM, Cheng Lian <lian.cs....@gmail.com <mailto:lian.cs....@gmail.com>> wrote:

    Hey Sadhan,

    I really don't think this is Spark log... Unlike Shark, Spark SQL
    doesn't even provide a Hive mode to let you execute queries
    against Hive. Would you please check whether there is an existing
    HiveServer2 running there? Spark SQL HiveThriftServer2 is just a
    Spark port of HiveServer2, and they share the same default
    listening port. I guess the Thrift server didn't start
    successfully because the HiveServer2 occupied the port, and your
    Beeline session was probably linked against HiveServer2.

    Cheng


    On 11/11/14 8:29 AM, Sadhan Sood wrote:
    I was testing out the spark thrift jdbc server by running a
    simple query in the beeline client. The spark itself is running
    on a yarn cluster.

    However, when I run a query in beeline -> I see no running jobs
    in the spark UI(completely empty) and the yarn UI seem to
    indicate that the submitted query is being run as a map reduce
    job. This is probably also being indicated from the spark logs
    but I am not completely sure:

    2014-11-11 00:19:00,492 INFO  ql.Context
    (Context.java:getMRScratchDir(267)) - New scratch dir is
    
hdfs://xxxxxxxx:9000/tmp/hive-ubuntu/hive_2014-11-11_00-19-00_367_3847629323646885865-1

    2014-11-11 00:19:00,877 INFO  ql.Context
    (Context.java:getMRScratchDir(267)) - New scratch dir is
    
hdfs://xxxxxxxx:9000/tmp/hive-ubuntu/hive_2014-11-11_00-19-00_367_3847629323646885865-2

    2014-11-11 00:19:04,152 INFO  ql.Context
    (Context.java:getMRScratchDir(267)) - New scratch dir is
    
hdfs://xxxxxxxx:9000/tmp/hive-ubuntu/hive_2014-11-11_00-19-00_367_3847629323646885865-2

    2014-11-11 00:19:04,425 INFO Configuration.deprecation
    (Configuration.java:warnOnceIfDeprecated(1009)) -
    mapred.submit.replication is deprecated. Instead, use
    mapreduce.client.submit.file.replication

    2014-11-11 00:19:04,516 INFO client.RMProxy
    (RMProxy.java:createRMProxy(92)) - Connecting to ResourceManager
    at xxxxxxxx:8032

    2014-11-11 00:19:04,607 INFO client.RMProxy
    (RMProxy.java:createRMProxy(92)) - Connecting to ResourceManager
    at xxxxxxxx:8032

    2014-11-11 00:19:04,639 WARN mapreduce.JobSubmitter
    (JobSubmitter.java:copyAndConfigureFiles(150)) - Hadoop
    command-line option parsing not performed. Implement the Tool
    interface and execute your application with ToolRunner to remedy this

    2014-11-11 00:00:08,806 INFO  input.FileInputFormat
    (FileInputFormat.java:listStatus(287)) - Total input paths to
    process : 14912

    2014-11-11 00:00:08,864 INFO  lzo.GPLNativeCodeLoader
    (GPLNativeCodeLoader.java:<clinit>(34)) - Loaded native gpl library

    2014-11-11 00:00:08,866 INFO  lzo.LzoCodec
    (LzoCodec.java:<clinit>(76)) - Successfully loaded & initialized
    native-lzo library [hadoop-lzo rev
    8e266e052e423af592871e2dfe09d54c03f6a0e8]

    2014-11-11 00:00:09,873 INFO  input.CombineFileInputFormat
    (CombineFileInputFormat.java:createSplits(413)) - DEBUG:
    Terminated node allocation with : CompletedNodes: 1, size left:
    194541317

    2014-11-11 00:00:10,017 INFO  mapreduce.JobSubmitter
    (JobSubmitter.java:submitJobInternal(396)) - number of splits:615

    2014-11-11 00:00:10,095 INFO  mapreduce.JobSubmitter
    (JobSubmitter.java:printTokens(479)) - Submitting tokens for job:
    job_1414084656759_0115

    2014-11-11 00:00:10,241 INFO  impl.YarnClientImpl
    (YarnClientImpl.java:submitApplication(167)) - Submitted
    application application_1414084656759_0115


    It seems like the query is being run as a hive query instead of
    spark query. The same query works fine when run from spark-sql cli.




Reply via email to