]
Sent: 27 November 2015 14:03
To: Mich Talebzadeh <m...@peridale.co.uk>
Cc: user <user@spark.apache.org>
Subject: Re: Hive using Spark engine alone
Hi,
I recommend to use the latest version of Hive. You may also wait for hive on
tez with tez version >= 0.8 and hive >
For those interested
From: Mich Talebzadeh [mailto:m...@peridale.co.uk]
Sent: 06 December 2015 20:33
To: u...@hive.apache.org
Subject: Managed to make Hive run on Spark engine
Thanks all especially to Xuefu.for contributions. Finally it works, which means
don’t give up until it works
Thanks sorted.
Actually I used version 1.3.1 and now I managed to make it work as Hive
execution engine.
Cheers,
Mich Talebzadeh
Sybase ASE 15 Gold Medal Award 2008
A Winning Strategy: Running the most Critical Financial Data on ASE 15
http://login.sybase.com/files
Or try this
cast(from_unixtime(unix_timestamp()) AS timestamp
HTH
Mich Talebzadeh
Sybase ASE 15 Gold Medal Award 2008
A Winning Strategy: Running the most Critical Financial Data on ASE 15
http://login.sybase.com/files/Product_Overviews/ASE-Winning-Strategy-091908.pdf
Author
), otherwise
it will have to use disk space. So it boils down to how much memory you
have.
HTH
Mich Talebzadeh
Sybase ASE 15 Gold Medal Award 2008
A Winning Strategy: Running the most Critical Financial Data on ASE 15
http://login.sybase.com/files/Product_Overviews/ASE-Winning
Hi,
I have seen mails that state that the user has managed to build spark 1.3 to
work with Hive. I tried Spark 1.5.2 but no luck
I downloaded spark source 1.3 source code spark-1.3.0.tar and built it as
follows
./make-distribution.sh --name "hadoop2-without-hive" --tgz
Hi,
I am trying to make Hive work with Spark.
I have been told that I need to use Spark 1.3 and build it from source code
WITHOUT HIVE libraries.
I have built it as follows:
./make-distribution.sh --name "hadoop2-without-hive" --tgz
Hi,
I am trying to make Hive work with Spark.
I have been told that I need to use Spark 1.3 and build it from source code
WITHOUT HIVE libraries.
I have built it as follows:
./make-distribution.sh --name "hadoop2-without-hive" --tgz
Hi,
I am trying to make Hive work with Spark.
I have been told that I need to use Spark 1.3 and build it from source code
WITHOUT HIVE libraries.
I have built it as follows:
./make-distribution.sh --name "hadoop2-without-hive" --tgz
and try again
Thanks,
Mich Talebzadeh
Sybase ASE 15 Gold Medal Award 2008
A Winning Strategy: Running the most Critical Financial Data on ASE 15
http://login.sybase.com/files/Product_Overviews/ASE-Winning-Strategy-091908.pdf
Author of the books "A Practitioner’s Guide to Upgrading to Sybase A
Thanks I tried all :(
I am trying to make Hive use Spark and apparently Hive can use version 1.3 of
Spark as execution engine. Frankly I don’t know why this is not working!
Mich Talebzadeh
Sybase ASE 15 Gold Medal Award 2008
A Winning Strategy: Running the most Critical Financial
ClientImpl:at
org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala:110)
15/12/03 17:53:19 [stderr-redir-1]: INFO client.SparkClientImpl:at
org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala)
Any clues?
Mich Talebzadeh
Sybase ASE 15 Gold Medal Award 2008
A Winning Strategy: Runnin
). There
will be a job scheduler and one or more Spark Executors depending on the
cluster. So as far as I can see both diagrams are correct,
HTH
Mich Talebzadeh
Sybase ASE 15 Gold Medal Award 2008
A Winning Strategy: Running the most Critical Financial Data on ASE 15
<http://login.sybase.com/fi
. It is the
responsibility of the recipient to ensure that this email is virus free,
therefore neither Peridale Ltd, its subsidiaries nor their employees accept any
responsibility.
From: Ted Yu [mailto:yuzhih...@gmail.com]
Sent: 25 November 2015 22:35
To: Mich Talebzadeh <m...@peridale.co.uk>
Cc
/1312M
[INFO]
Mich Talebzadeh
Sybase ASE 15 Gold Medal Award 2008
A Winning Strategy: Running the most Critical Financial Data on ASE 15
http://login.sybase.com/files/Product_Overviews/ASE-Winning-Strategy-091908.pdf
Rule 0: org.apache.maven.plugins.enforcer.RequireMavenVersion failed
with message:
Detected Maven Version: 3.3.1 is not in the allowed range 3.3.3.
Mich Talebzadeh
Sybase ASE 15 Gold Medal Award 2008
A Winning Strategy: Running the most Critical Financial Data on ASE 15
http://login.sybase.com
Hi,
I am trying to build sparc from the source and not using Hive. I am getting
[error] Required file not found: scala-compiler-2.10.4.jar
[error] See zinc -help for information about locating necessary files
I have to run this as root otherwise build does not progress. Any help is
.loadClass(ClassLoader.java:357)
... 6 more
Although I have added to the CLASSPATH.
Mich Talebzadeh
Sybase ASE 15 Gold Medal Award 2008
A Winning Strategy: Running the most Critical Financial Data on ASE 15
<http://login.sybase.com/files/Product_Overviews/ASE-Winning-Strateg
. The primary reason I want to use Hive on Spark engine is
for performance.
Thanks,
Mich Talebzadeh
Sybase ASE 15 Gold Medal Award 2008
A Winning Strategy: Running the most Critical Financial Data on ASE 15
<http://login.sybase.com/files/Product_Overviews/ASE-Winning-Strategy-091
From: Mich Talebzadeh [mailto:m...@peridale.co.uk]
Sent: 20 November 2015 21:14
To: u...@hive.apache.org
Subject: starting spark-shell throws /tmp/hive on HDFS should be writable
error
Hi,
Has this been resolved. I don't think this has anything to do with /tmp/hive
directory permission
rvers" ->
"rhes564:9092", "schema.registry.url" -> "http://rhes564:8081;,
"zookeeper.connect" -> "rhes564:2181", "group.id" ->
"CEP_streaming_with_JDBC" )
val topics = Set("newtopic")
val dstream =
hang on are you saving this as a new table?
Dr Mich Talebzadeh
LinkedIn *
https://www.linkedin.com/profile/view?id=AAEWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw
<https://www.linkedin.com/profile/view?id=AAEWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw>*
http://talebzadehmich.wordpress.com
On
ok what is the new column is called? you are basically adding a new column
to an already existing table
Dr Mich Talebzadeh
LinkedIn *
https://www.linkedin.com/profile/view?id=AAEWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw
<https://www.linkedin.com/profile/view
interesting. a vm with one core!
one simple test
can you try running with
--executor-cores=1
and see it works ok please
Dr Mich Talebzadeh
LinkedIn *
https://www.linkedin.com/profile/view?id=AAEWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw
<https://www.linkedin.com/profile/view
like a connection is left open but cannot establish why!
Thanks
Dr Mich Talebzadeh
LinkedIn *
https://www.linkedin.com/profile/view?id=AAEWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw
<https://www.linkedin.com/profile/view?id=AAEWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw>*
what version of spark are you using
Dr Mich Talebzadeh
LinkedIn *
https://www.linkedin.com/profile/view?id=AAEWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw
<https://www.linkedin.com/profile/view?id=AAEWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw>*
http://talebzadehmich.wordpress.com
On 3 June 2016
by dt in notime
Now what I don't understand whether that table is already partitioned as
you said the table already exists!
Dr Mich Talebzadeh
LinkedIn *
https://www.linkedin.com/profile/view?id=AAEWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw
<https://www.linkedin.com/profile/view
/Hadoop/slaves
HTH
Dr Mich Talebzadeh
LinkedIn *
https://www.linkedin.com/profile/view?id=AAEWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw
<https://www.linkedin.com/profile/view?id=AAEWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw>*
http://talebzadehmich.wordpress.com
On 5 June 2016 at 10:50, Marco Cap
sure I am trying to use
SparkContext.setCheckpointDir(directory: String)
to set it up.
I agree that once one start creating subdirectory
like "~/checkpoints/${APPLICATION_NAME}/${USERNAME}!" it becomes a bit messy
cheers
Dr Mich Talebzadeh
LinkedIn *
https://www.linkedin.com/pr
Talebzadeh
LinkedIn *
https://www.linkedin.com/profile/view?id=AAEWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw
<https://www.linkedin.com/profile/view?id=AAEWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw>*
http://talebzadehmich.wordpress.com
On 3 June 2016 at 20:48, Mich Talebzadeh <mich.talebza...@gmail.c
}
}
I need to change one of these.
Actually a better alternative would be that each application has its own
checkpoint?
THanks
Dr Mich Talebzadeh
LinkedIn *
https://www.linkedin.com/profile/view?id=AAEWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw
<https://www.linkedin.com/profile/view?id=AAEWh2gB
I use YARN as I run Hive on Spark engine in yarn-cluster mode plus other
stuff. if I turn off YARN half of my applications won't work. I don't see
great concern for supporting YARN. However you may have other reasons
Dr Mich Talebzadeh
LinkedIn *
https://www.linkedin.com/profile/view?id
Talebzadeh
LinkedIn *
https://www.linkedin.com/profile/view?id=AAEWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw
<https://www.linkedin.com/profile/view?id=AAEWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw>*
http://talebzadehmich.wordpress.com
On 5 June 2016 at 14:09, Mich Talebzadeh <mic
ask 0 in stage 1.0 failed 1 times;
aborting job
Suggested solution.
In a concurrent env, Spark should apply locks in order to prevent such
operations. Locks are kept in Hive meta data table HIVE_LOCKS
HTH
Dr Mich Talebzadeh
LinkedIn *
https://www.linkedin.com/profile/view
table is locked as SHARED_READ
2. With Spark --> No locks at all
3. With HIVE --> No locks on the target table
4. With Spark --> No locks at all
HTH
Dr Mich Talebzadeh
LinkedIn *
https://www.linkedin.com/profile/view?id=AAEWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw
<https:
issue here
HTH
Dr Mich Talebzadeh
LinkedIn *
https://www.linkedin.com/profile/view?id=AAEWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw
<https://www.linkedin.com/profile/view?id=AAEWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw>*
http://talebzadehmich.wordpress.com
On 8 June 2016 at 22:36, Michael
ive there is the issue with DDL + DML locks applied in a
single transaction i.e. --> create table A as select * from b
HTH
Dr Mich Talebzadeh
LinkedIn *
https://www.linkedin.com/profile/view?id=AAEWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw
<https://www.linkedin.com/profil
check port 8080 on the node that you started start-master.sh
[image: Inline images 2]
HTH
Dr Mich Talebzadeh
LinkedIn *
https://www.linkedin.com/profile/view?id=AAEWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw
<https://www.linkedin.com/profile/view?id=AAEWh2gBxianrbJd6zP6AcPCCdOABUrV
that Spark assumes no concurrency
for Hive table. It is probably the same reason why updates/deletes to Hive
ORC transactional tables through Spark fail.
HTH
Dr Mich Talebzadeh
LinkedIn *
https://www.linkedin.com/profile/view?id=AAEWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw
<https://www.linkedin.
if they are down?
HTH
Dr Mich Talebzadeh
LinkedIn *
https://www.linkedin.com/profile/view?id=AAEWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw
<https://www.linkedin.com/profile/view?id=AAEWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw>*
http://talebzadehmich.wordpress.com
On 9 June 2016 at 01:27,
I assume you zookeeper is up and running
can you confirm that you are getting topics from kafka independently for
example on the command line
${KAFKA_HOME}/bin/kafka-console-consumer.sh --zookeeper rhes564:2181
--from-beginning --topic newtopic
Dr Mich Talebzadeh
LinkedIn *
https
ool/language to dif in to that data. For example twitter
streaming data. I am getting all sorts od stuff coming in. Say I am only
interested in certain topics like sport etc. How can I detect the signal
from the noise using what tool and language?
Thanks
Dr Mich Talebzadeh
LinkedIn *
ht
ou are aggregating data that you are collecting over
the batch Window
val countByValueAndWindow = price.filter(_ >
95.0).countByValueAndWindow(Seconds(windowLength), Seconds(slidingInterval))
countByValueAndWindow.print()
//
ssc.start()
ssc.awaitTermination()
HTH
Dr Mich Talebzadeh
LinkedIn *
https://w
*/java
HTH
Dr Mich Talebzadeh
LinkedIn *
https://www.linkedin.com/profile/view?id=AAEWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw
<https://www.linkedin.com/profile/view?id=AAEWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw>*
http://talebzadehmich.wordpress.com
On 7 June 2016 at 11:59, Dominik S
and the output below at the same time running tol see the exact cause of it
${KAFKA_HOME}/bin/kafka-console-consumer.sh --zookeeper rhes564:2181
--from-beginning --topic newtopic
Dr Mich Talebzadeh
LinkedIn *
https://www.linkedin.com/profile/view?id=AAEWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw
<ht
by default the driver will start where you have started
sbin/start-master.sh. that is where you start you app SparkSubmit.
The slaves have to have an entry in slaves file
What is the issue here?
Dr Mich Talebzadeh
LinkedIn *
https://www.linkedin.com/profile/view?id
} \
${OUTPUT_FILE_INTERVAL_IN_SECS:-10} \
${OUTPUT_FILE_PARTITIONS_EACH_INTERVAL:-1} \
Dr Mich Talebzadeh
LinkedIn *
https://www.linkedin.com/profile/view?id=AAEWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw
<https://www.linkedin.com/profile/view
?
the issue I believe you may face as you go from t0-> t1-.tn you volume of
data is going to rise.
How about periodic storage of your analysis and working on deltas only
afterwards?
What sort of data is it? Is it typical web-users?
HTH
Dr Mich Talebzadeh
LinkedIn *
https://www.linkedin.com/prof
Interesting. There is also apache nifi <https://nifi.apache.org/>
Also I note that one can store twitter data in Hive tables as well?
Dr Mich Talebzadeh
LinkedIn *
https://www.linkedin.com/profile/view?id=AAEWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw
<https://www.linkedin.com/profil
probability much more powerful than other nodes. Also the node that
running resource manager is also running one of the node manager as well.
So in theory may be in practice may not?
HTH
Dr Mich Talebzadeh
LinkedIn *
https://www.linkedin.com/profile/view?id=AAEWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw
all resources in use all the time.
However, resource manager itself is on the resource manager node.
Now I always start my Spark app on the same node as the resource manager
node and let Yarn take care of the rest.
Thanks
Dr Mich Talebzadeh
LinkedIn *
https://www.linkedin.com/profile/view?id
"indexing") via JSON, XML, CSV or binary over HTTP.
You query it via HTTP GET and receive JSON, XML, CSV or binary results.
thanks
Dr Mich Talebzadeh
LinkedIn *
https://www.linkedin.com/profile/view?id=AAEWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw
<https://www.linkedin.com/pr
a typical question.
You mentioned Spark ml (machine learning?) . Is that something viable?
Cheers
Dr Mich Talebzadeh
LinkedIn *
https://www.linkedin.com/profile/view?id=AAEWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw
<https://www.linkedin.com/profile/view?id=AAEWh2gBxianrbJd6zP6AcPCCdOABUrV
thanks I will have a look.
Mich
Dr Mich Talebzadeh
LinkedIn *
https://www.linkedin.com/profile/view?id=AAEWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw
<https://www.linkedin.com/profile/view?id=AAEWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw>*
http://talebzadehmich.wordpress.com
On 7 June 2016 at
I use Spark rather that Sqoop to import data from an Oracle table into a
Hive ORC table.
It used JDBC for this purpose. All inclusive in Scala itself.
Also Hive runs on Spark engine. Order of magnitude faster with Inde on
map-reduce/.
pretty simple.
HTH
Dr Mich Talebzadeh
LinkedIn
OK so this was Kafka issue?
Dr Mich Talebzadeh
LinkedIn *
https://www.linkedin.com/profile/view?id=AAEWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw
<https://www.linkedin.com/profile/view?id=AAEWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw>*
http://talebzadehmich.wordpress.com
On 7 June 2016 at
to make much difference. In sounds like yarn-cluster
supercedes yarn-client?
Any comments welcome
Dr Mich Talebzadeh
LinkedIn *
https://www.linkedin.com/profile/view?id=AAEWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw
<https://www.linkedin.com/profile/view?id=AAEWh2gBxianrbJd6z
trust that I am not nitpicking here!
Cheers,
Dr Mich Talebzadeh
LinkedIn *
https://www.linkedin.com/profile/view?id=AAEWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw
<https://www.linkedin.com/profile/view?id=AAEWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw>*
http://talebzadehmich.wordpress.com
Hi,
You basically want to use wired/Ethernet connections as opposed to wireless?
in Your Spark Web UI under environment table what do you get for "
spark.driver.host".
Also can you cat /etc/hosts and send the output please and the output
from ifconfig -a
HTH
Dr Mich Talebzadeh
Hi John,
I did not notice anything unusual in your env variables.
However, what are the batch interval, the windowsLength and SlindingWindow
interval.
Also how many messages are sent by Kafka in a typical batch interval?
HTH
Dr Mich Talebzadeh
LinkedIn *
https://www.linkedin.com/profile
is the nature of this spark streaming if you can divulge on it?
HTH
Dr Mich Talebzadeh
LinkedIn *
https://www.linkedin.com/profile/view?id=AAEWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw
<https://www.linkedin.com/profile/view?id=AAEWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw>*
not make it worthwhile.
HTH
Dr Mich Talebzadeh
LinkedIn *
https://www.linkedin.com/profile/view?id=AAEWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw
<https://www.linkedin.com/profile/view?id=AAEWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw>*
http://talebzadehmich.wordpress.com
On 10 June 2016 at
etween a pair of "" will be interpreted as text NOT column name.
In Spark SQL you do not need double quotes. So simply
spark-sql> select prod_id, cust_id from sales limit 2;
17 28017
18 10419
HTH
Dr Mich Talebzadeh
LinkedIn *
https://www.li
t;,
"orc.row.index.stride"="1" )
"""
HiveContext.sql(sqltext)
//
// Put data in Hive table. Clean up is already done
//
sqltext = """
INSERT INTO TABLE oraclehadoop.dummy
SELECT
ID
, CLUSTERED
, S
Hi Rutuja,
I am not certain whether such tool exists or not, However, opening a JIRA
may be beneficial and would not do any harm.
You may look for workaround. Now my understanding is that your need is for
monitoring the health of the cluster?
HTH
Dr Mich Talebzadeh
LinkedIn *
https
how are you doing the insert? from an existing table?
Dr Mich Talebzadeh
LinkedIn *
https://www.linkedin.com/profile/view?id=AAEWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw
<https://www.linkedin.com/profile/view?id=AAEWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw>*
http://talebzadehmich.wordpre
w to
deduce if there was indeed spillage to disk by Spark see (TungstenAggregate)
HTH
Dr Mich Talebzadeh
LinkedIn *
https://www.linkedin.com/profile/view?id=AAEWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw
<https://www.linkedin.com/profile/view?id=AAEWh2gBxianrbJd6zP6AcPCCdOABUr
cam you provide a code snippet of how you are populating the target table
from temp table.
HTH
Dr Mich Talebzadeh
LinkedIn *
https://www.linkedin.com/profile/view?id=AAEWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw
<https://www.linkedin.com/profile/view?id=AAEWh2gBxianrbJd6zP6AcPCCdOABUrV
are you using map-reduce with Hive?
Dr Mich Talebzadeh
LinkedIn *
https://www.linkedin.com/profile/view?id=AAEWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw
<https://www.linkedin.com/profile/view?id=AAEWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw>*
http://talebzadehmich.wordpress.com
On 9 June 2016
0 2016-06-03 23:38
/user/hduser/checkpoint/TwitterAnalyzer$/receivedBlockMetadata
-rw-r--r-- 2 hduser supergroup 5199 2016-06-03 23:39
/user/hduser/checkpoint/TwitterAnalyzer$/temp
It works fine.
HTH
Dr Mich Talebzadeh
LinkedIn *
https://www.linkedin
I know this question may not be directly relevant but what are the main
approaches, one real time analysis of twitter using spark streaming and the
other store data in hdfs and use later.?
Thanks
Dr Mich Talebzadeh
LinkedIn *
https://www.linkedin.com/profile/view
E} \
${OUTPUT_DIRECTORY:-/tmp/tweets} \
${NUM_TWEETS_TO_COLLECT:-10000} \
${OUTPUT_FILE_INTERVAL_IN_SECS:-10} \
${OUTPUT_FILE_PARTITIONS_EACH_INTERVAL:-1} \
>> ${LOG_FILE}
Dr Mich Talebzadeh
LinkedIn *
https://www.linkedin.com/profile/view?id=AAEW
om> wrote:
> Or combine both! It is possible with Spark Streaming to combine streaming
> data and on HDFS. In the end it always depends what you want to do and when
> you need what.
>
> On 03 Jun 2016, at 10:26, Mich Talebzadeh <mich.talebza...@gmail.com>
> wrote:
>
&g
. Unlike
Local or Spark standalone modes, in which the master’s address is specified
in the --master parameter, in YARN mode the ResourceManager’s address is
picked up from the Hadoop configuration. Thus, the --master parameter is
yarn
HTH
Dr Mich Talebzadeh
LinkedIn *
https
st and your
best hope is using all the available cores.
Hence in summary by using Spark in standalone mode (actually this
terminology is a bit misleading, it would be better if they called it Spark
Own Scheduler Mode (OSM)), you will have better performance due to
clustering nature of Spark.
HTH
Dr Mi
yes absolutely Ted.
Thanks for highlighting it
Dr Mich Talebzadeh
LinkedIn *
https://www.linkedin.com/profile/view?id=AAEWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw
<https://www.linkedin.com/profile/view?id=AAEWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw>*
http://talebzadehmich.wordpress.com
(and these are pretty
respectful when it comes to Spark) and progress from there.
If you have a certain problem then put to this group and I am sure someone
somewhere in this forum has come across it. Also most of these books'
authors actively contribute to this mailing list.
HTH
Dr Mich Talebzadeh
it is a good to be in control :)
Dr Mich Talebzadeh
LinkedIn *
https://www.linkedin.com/profile/view?id=AAEWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw
<https://www.linkedin.com/profile/view?id=AAEWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw>*
http://talebzadehmich.wordpress.com
On 14 June 2016
t;="0.05",
"orc.stripe.size"="268435456",
"orc.row.index.stride"="1" )
"""
sql(sqltext)
sql("select count(1) from test.orctype").show
res2: org.apache.spark.sql.DataFrame = [result: string]
+--
In all probability there is no user database created in Hive
Create a database yourself
sql("create if not exists database test")
It would be helpful if you grasp some concept of Hive databases etc?
HTH
Dr Mich Talebzadeh
LinkedIn *
https://www.linkedin.com/profi
Hi Swetha,
Have you actually tried doing this in Hive using Hive CLI or beeline?
Thanks
Dr Mich Talebzadeh
LinkedIn *
https://www.linkedin.com/profile/view?id=AAEWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw
<https://www.linkedin.com/profile/view?id=AAEWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw>*
ption, Value, Balance,
AccountName, AccountNumber from tmp").take(2)
replace those with your column names. they are mapped using case class
HTH
Dr Mich Talebzadeh
LinkedIn *
https://www.linkedin.com/profile/view?id=AAEWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw
<https://www.linkedin.com/pr
at last some progress :)
Dr Mich Talebzadeh
LinkedIn *
https://www.linkedin.com/profile/view?id=AAEWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw
<https://www.linkedin.com/profile/view?id=AAEWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw>*
http://talebzadehmich.wordpress.com
On 15 June 2016 at 10:5
Have you looked at spark GUI to see what it is waiting for. is that
available memory. What is the resource manager you are using?
Dr Mich Talebzadeh
LinkedIn *
https://www.linkedin.com/profile/view?id=AAEWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw
<https://www.linkedin.com/profile/view
--conf "spark.ui.port=4040" \
${JAR_FILE}
The spark GUI UI port is 4040 (the default). Just track the progress of the
job. You can specify your own port by replacing 4040 by a nom used port
value
Try it anyway.
HTH
Dr Mich Talebzadeh
LinkedIn *
ollow">Twitter for
iPhone
itter.com/download/android" rel="nofollow">Twitter for Android Free
Lyft credit with Lyft promo code LYFTLUSHpp.com" rel="nofollow">Buffer
naliar for iPad
third person
男子南ことりが大好きなラブライバーです! ラブライブ大好きな人ぜひフォローしてください
固定ツイートお願いします
ラブライブに出会え
memory/heap/cpu etc
HTH
Dr Mich Talebzadeh
LinkedIn *
https://www.linkedin.com/profile/view?id=AAEWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw
<https://www.linkedin.com/profile/view?id=AAEWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw>*
http://talebzadehmich.wordpress.com
On 29 May 2016 at 00:26
You are welcome
Also use can use OS command /usr/bin/free to see how much free memory you
have on each node.
You should also see from SPARK GUI (first job on master node:4040, next on
4041etc) the resource and Storage (memory usage) for each SparkSubmit job.
HTH
Dr Mich Talebzadeh
to NOT to use local mode in prod.
Others may have different opinions on this.
HTH
Dr Mich Talebzadeh
LinkedIn *
https://www.linkedin.com/profile/view?id=AAEWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw
<https://www.linkedin.com/profile/view?id=AAEWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw>*
is to try running the first one, Check Web GUI
on 4040 to see the progress of this Job. If you start the next JVM then
assuming it is working, it will be using port 4041 and so forth.
In actual fact try the command "free" to see how much free memory you have.
HTH
Dr Mich Talebzadeh
OK that is good news. So briefly how do you kick off spark-submit for each
(or sparkConf). In terms of memory/resources allocations.
Now what is the output of
/usr/bin/free
Dr Mich Talebzadeh
LinkedIn *
https://www.linkedin.com/profile/view?id=AAEWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw
Mich Talebzadeh
LinkedIn *
https://www.linkedin.com/profile/view?id=AAEWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw
<https://www.linkedin.com/profile/view?id=AAEWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw>*
http://talebzadehmich.wordpress.com
On 28 May 2016 at 17:41, Ted Yu <yuzhih...@gmail.c
ok they are submitted but the latter one 14302 is it doing anything?
can you check it with jmonitor or the logs created
HTH
Dr Mich Talebzadeh
LinkedIn *
https://www.linkedin.com/profile/view?id=AAEWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw
<https://www.linkedin.com/profile/view
GUI
[image: Inline images 1]
Dr Mich Talebzadeh
LinkedIn *
https://www.linkedin.com/profile/view?id=AAEWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw
<https://www.linkedin.com/profile/view?id=AAEWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw>*
http://talebzadehmich.wordpress.com
On 30 May 2016 at 10:1
but of course Spark has both plus
in-memory capability.
It would be interesting to see what version of TEZ works as execution
engine with Hive.
Vendors are divided on this (use Hive with TEZ) or use Impala instead of
Hive etc as I am sure you already know.
Cheers,
Dr Mich Talebzadeh
LinkedIn
data). 80-20 rule?
In reality may be just 2TB or most recent partitions etc. The rest is cold
data.
cheers
Dr Mich Talebzadeh
LinkedIn *
https://www.linkedin.com/profile/view?id=AAEWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw
<https://www.linkedin.com/profile/view
"amount_sold").as("TotalSales"))
val rs =
s.join(t,"time_id").join(c,"channel_id").groupBy("calendar_month_desc","channel_desc").agg(sum("amount_sold").as("TotalSales"))
HTH
Dr Mich Talebzadeh
LinkedIn *
https://w
Hi Teng,
what version of spark are using as the execution engine. are you using a
vendor's product here?
thanks
Dr Mich Talebzadeh
LinkedIn *
https://www.linkedin.com/profile/view?id=AAEWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw
<https://www.linkedin.com/profile/view
are you using JDBC in spark shell
Dr Mich Talebzadeh
LinkedIn *
https://www.linkedin.com/profile/view?id=AAEWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw
<https://www.linkedin.com/profile/view?id=AAEWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw>*
http://talebzadehmich.wordpress.com
On 27 May 2016
Hi Ted,
do you mean Hive 2 with spark 2 snapshot build as the execution engine just
binaries for snapshot (all ok)?
Dr Mich Talebzadeh
LinkedIn *
https://www.linkedin.com/profile/view?id=AAEWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw
<https://www.linkedin.com/profile/view
1 - 100 of 2079 matches
Mail list logo