Could you do something like this prior to calling the action.
// Create FileSystem object from Hadoop Configuration
val fs = FileSystem.get(spark.sparkContext.hadoopConfiguration)
// This methods returns Boolean (true - if file exists, false - if file
doesn't exist
val fileExists = fs.exists(new
You may want to make sure you include the jar of P4J and your plugins as
part of the following so that both the driver and executors have access.
If HDFS is out then you could
make a common mount point on each of the executor nodes so they have access
to the classes.
- spark-submit --jars
A little late, but have you looked at https://livy.incubator.apache.org/,
works well for us.
-Todd
On Thu, Mar 28, 2019 at 9:33 PM Jason Nerothin
wrote:
> Meant this one: https://docs.databricks.com/api/latest/jobs.html
>
> On Thu, Mar 28, 2019 at 5:06 PM Pat Ferrel wrote:
>
>> Thanks, are
Hi Tomas,
Have you considered using something like https://www.alluxio.org/ for you
cache? Seems like a possible solution for what your trying to do.
-Todd
On Tue, Jan 15, 2019 at 11:24 PM 大啊 wrote:
> Hi ,Tomas.
> Thanks for your question give me some prompt.But the best way use cache
>
figuration=log4j-spark.properties" \
>--files "${JAAS_CONF},${KEYTAB}" \
>--class "${MAIN_CLASS}" \
>"${ARTIFACT_FILE}"
>
>
> The first batch is huge, even if it worked for the first batch I would've
> tried researching more. The
Hi Biplob,
How many partitions are on the topic you are reading from and have you set
the maxRatePerPartition? iirc, spark back pressure is calculated as
follows:
*Spark back pressure:*
Back pressure is calculated off of the following:
• maxRatePerPartition=200
• batchInterval 30s
• 3
Hi Mich,
You could look at http://www.exasol.com/. It works very well with Tableau
without the need to extract the data. Also in V6, it has the virtual
schemas which would allow you to access data in Spark, Hive, Oracle, or
other sources.
May be outside of what you are looking for, it works
These types of questions would be better asked on the user mailing list for
the Spark Cassandra connector:
http://groups.google.com/a/lists.datastax.com/forum/#!forum/spark-connector-user
Version compatibility can be found here:
Hi Mich,
Have you looked at Apache Ignite? https://apacheignite-fs.readme.io/docs.
This looks like something that may be what your looking for:
http://apacheignite.gridgain.org/docs/data-analysis-with-apache-zeppelin
HTH.
-Todd
On Sat, Sep 17, 2016 at 12:53 PM, Mich Talebzadeh
Hi Mich,
Perhaps the issue is having multiple SparkContexts in the same JVM (
https://issues.apache.org/jira/browse/SPARK-2243).
While it is possible, I don't think it is encouraged.
As you know, the call your currently invoking to create the
StreamingContext also creates a
SparkContext.
/** *
Have not tried this, but looks quite useful if one is using Druid:
https://github.com/implydata/pivot - An interactive data exploration UI
for Druid
On Tue, Aug 30, 2016 at 4:10 AM, Alonso Isidoro Roman
wrote:
> Thanks Mitch, i will check it.
>
> Cheers
>
>
> Alonso
Have you looked at spark-packges.org? There are several different HBase
connectors there, not sure if any meet you need or not.
https://spark-packages.org/?q=hbase
HTH,
-Todd
On Tue, Aug 30, 2016 at 5:23 AM, ayan guha wrote:
> You can use rdd level new hadoop format
This is due to a change in 1.6, by default the Thrift server runs in
multi-session mode. You would want to set the following to true on your
spark config.
spark-default.conf set spark.sql.hive.thriftServer.singleSession
Good write up here:
You can set the dbtable to this:
.option("dbtable", "(select * from master_schema where 'TID' = '100_0')")
HTH,
Todd
On Thu, Jul 21, 2016 at 10:59 AM, sujeet jog wrote:
> I have a table of size 5GB, and want to load selective rows into dataframe
> instead of loading
quorum defined in
> config, running in standalone mode
> (org.apache.zookeeper.server.quorum.QuorumPeerMain)
>
> Any indication onto why the channel connection might be closed? Would it
> be Kafka or Zookeeper related?
>
> On 07 Jun 2016, at 14:07, Todd Nist <tsind...@gmail.c
What version of Spark are you using? I do not believe that 1.6.x is
compatible with 0.9.0.1 due to changes in the kafka clients between 0.8.2.2
and 0.9.0.x. See this for more information:
https://issues.apache.org/jira/browse/SPARK-12177
-Todd
On Tue, Jun 7, 2016 at 7:35 AM, Dominik Safaric
Perhaps these may be of some use:
https://github.com/mkuthan/example-spark
http://mkuthan.github.io/blog/2015/03/01/spark-unit-testing/
https://github.com/holdenk/spark-testing-base
On Wed, May 18, 2016 at 2:14 PM, swetha kasireddy wrote:
> Hi Lars,
>
> Do you have
I believe the class you are looking for is
org/apache/spark/sql/execution/datasources/jdbc/JdbcUtils.scala.
By default in savePartition(...) , it will do the following:
if (supportsTransactions) { conn.setAutoCommit(false) // Everything in the
same db transaction. } Then at line 224, it will
Have you looked at these:
http://allegro.tech/2015/08/spark-kafka-integration.html
http://mkuthan.github.io/blog/2016/01/29/spark-kafka-integration2/
Full example here:
https://github.com/mkuthan/example-spark-kafka
HTH.
-Todd
On Thu, Apr 21, 2016 at 2:08 PM, Alexander Gallego
I believe you can adjust it by setting the following:
spark.akka.timeout 100s Communication timeout between Spark nodes.
HTH.
-Todd
On Thu, Apr 21, 2016 at 9:49 AM, yuemeng (A) wrote:
> When I run a spark application,sometimes I get follow ERROR:
>
> 16/04/21 09:26:45
So there is an offering from Stratio, https://github.com/Stratio/Decision
Decision CEP engine is a Complex Event Processing platform built on Spark
> Streaming.
>
> It is the result of combining the power of Spark Streaming as a continuous
> computing framework and Siddhi CEP engine as complex
The updateStateByKey can be supplied an initialRDD to populate it with.
Per code (
https://github.com/apache/spark/blob/v1.4.0/streaming/src/main/scala/org/apache/spark/streaming/dstream/PairDStreamFunctions.scala#L435-L445
).
Provided here for your convenience.
/**
* Return a new "state"
Hi Vinti,
All of your tasks are failing based on the screen shots provided.
I think a few more details would be helpful. Is this YARN or a Standalone
cluster? How much overall memory is on your cluster? On each machine
where workers and executors are running? Are you using the Direct
Have you looked at Apache Toree, http://toree.apache.org/. This was
formerly the Spark-Kernel from IBM but contributed to apache.
https://github.com/apache/incubator-toree
You can find a good overview on the spark-kernel here:
You could also look at Apache Toree, http://toree.apache.org/
, github : https://github.com/apache/incubator-toree. This use to be the
Spark Kernel from IBM but has been contributed to Apache.
Good overview here on its features,
cluster ?
> Am I missing something obvious ?
>
>
> Le dim. 28 févr. 2016 à 19:01, Todd Nist <tsind...@gmail.com> a écrit :
>
>> Define your SparkConfig to set the master:
>>
>> val conf = new SparkConf().setAppName(AppName)
>> .setMaster(SparkMaster)
Define your SparkConfig to set the master:
val conf = new SparkConf().setAppName(AppName)
.setMaster(SparkMaster)
.set()
Where SparkMaster = "spark://SparkServerHost:7077". So if your spark
server hostname it "RADTech" then it would be "spark://RADTech:7077".
Then when you create
You could use the "withSessionDo" of the SparkCassandrConnector to preform
the simple insert:
CassandraConnector(conf).withSessionDo { session => session.execute() }
-Todd
On Tue, Feb 16, 2016 at 11:01 AM, Cody Koeninger wrote:
> You could use sc.parallelize... but the
Hi Satish,
You should be able to do something like this:
val props = new java.util.Properties()
props.put("user", username)
props.put("password",pwd)
props.put("driver", "org.postgresql.Drive")
val deptNo = 10
val where = Some(s"dept_number = $deptNo")
val df =
I had a similar problem a while back and leveraged these Kryo serializers,
https://github.com/magro/kryo-serializers. I had to fallback to version
0.28, but that was a while back. You can add these to the
org.apache.spark.serializer.KryoRegistrator
and then set your registrator in the spark
Hi Rajeshwar Gaini,
dbtable can be any valid sql query, simple define it as a sub query,
something like:
val query = "(SELECT country, count(*) FROM customer group by country) as
X"
val df1 = sqlContext.read
.format("jdbc")
.option("url", url)
.option("user", username)
Sorry, did not see your update until now.
On Fri, Jan 8, 2016 at 3:52 PM, Todd Nist <tsind...@gmail.com> wrote:
> Hi Yasemin,
>
> What version of Spark are you using? Here is the reference, it is off of
> the DataFrame
> https://spark.apache.org/docs/lates
that Todd mentioned or i cant find it.
> The code and error are in gist
> <https://gist.github.com/yaseminn/f5a2b78b126df71dfd0b>. Could you check
> it out please?
>
> Best,
> yasemin
>
> 2016-01-08 18:23 GMT+02:00 Todd Nist <tsind...@gmail.com>:
>
>> It
It is not clear from the information provided why the insertIntoJDBC failed
in #2. I would note that method on the DataFrame as been deprecated since
1.4, not sure what version your on. You should be able to do something
like this:
That should read "I think your missing the --name option". Sorry about
that.
On Wed, Jan 6, 2016 at 3:03 PM, Todd Nist <tsind...@gmail.com> wrote:
> Hi Jade,
>
> I think you "--name" option. The makedistribution should look like this:
>
> ./make-distr
Hi Jade,
I think you "--name" option. The makedistribution should look like this:
./make-distribution.sh --name hadoop-2.6 --tgz -Pyarn -Phadoop-2.6
-Dhadoop.version=2.6.0 -Phive -Phive-thriftserver -DskipTests.
As for why it failed to build with scala 2.11, did you run the
i.apache.org/confluence/display/MAVEN/PluginExecutionException
> [ERROR]
> [ERROR] After correcting the problems, you can resume the build with the
> command
> [ERROR] mvn -rf :spark-launcher_2.10
>
> Do you think it’s java problem? I’m using oracle JDK 1.7. Should I update
> it to
Another possible alternative is to register a StreamingListener and then
reference the BatchInfo.numRecords; good example here,
https://gist.github.com/akhld/b10dc491aad1a2007183.
After registering the listener, Simply implement the appropriate "onEvent"
method where onEvent is onBatchStarted,
see https://issues.apache.org/jira/browse/SPARK-11043, it is resolved in
1.6.
On Tue, Dec 15, 2015 at 2:28 PM, Younes Naguib <
younes.nag...@tritondigital.com> wrote:
> The one coming with spark 1.5.2.
>
>
>
> y
>
>
>
> *From:* Ted Yu [mailto:yuzhih...@gmail.com]
> *Sent:* December-15-15 1:59 PM
Perhaps the new trackStateByKey targeted for very 1.6 may help you here.
I'm not sure if it is part of 1.6 or not for sure as the jira does not
specify a fixed version. The jira describing it is here:
https://issues.apache.org/jira/browse/SPARK-2629, and the design doc that
discusses the API
The default is to start applications with port 4040 and then increment them
by 1 as you are seeing, see docs here:
http://spark.apache.org/docs/latest/monitoring.html#web-interfaces
You can override this behavior by setting passing the --conf
spark.ui.port=4080 or in your code; something like
Hi Abhi,
You should be able to register a
org.apache.spark.streaming.scheduler.StreamListener.
There is an example here that may help:
https://gist.github.com/akhld/b10dc491aad1a2007183 and the spark api docs
here,
(StreamingListenerBatchSubmitted batchSubmitted)
{ system.out.println("Start time: " +
batchSubmitted.batchInfo.processingStartTime)
}
Sorry for the confusion.
-Todd
On Tue, Nov 24, 2015 at 7:51 PM, Todd Nist <tsind...@gmail.com> wrote:
> Hi Abhi,
>
> You s
I issued the same basic command and it worked fine.
RADTech-MBP:spark $ ./make-distribution.sh --name hadoop-2.6 --tgz -Pyarn
-Phadoop-2.6 -Dhadoop.version=2.6.0 -Phive -Phive-thriftserver -DskipTests
Which created: spark-1.6.0-SNAPSHOT-bin-hadoop-2.6.tgz in the root
directory of the project.
2.11 artifacts are in fact published:
> http://search.maven.org/#search%7Cga%7C1%7Ca%3A%22spark-parent_2.11%22
>
> On Sun, Oct 25, 2015 at 7:37 PM, Todd Nist <tsind...@gmail.com> wrote:
> > Sorry Sean you are absolutely right it supports 2.11 all o meant is
> there is
> >
Hi Bilnmek,
Spark 1.5.x does not support Scala 2.11.7 so the easiest thing to do it
build it like your trying. Here are the steps I followed to build it on a
Max OS X 10.10.5 environment, should be very similar on ubuntu.
1. set theJAVA_HOME environment variable in my bash session via export
t support 2.11? It does.
>
> It is not even this difficult; you just need a source distribution,
> and then run "./dev/change-scala-version.sh 2.11" as you say. Then
> build as normal
>
> On Sun, Oct 25, 2015 at 4:00 PM, Todd Nist <tsind...@gmail.com
> <javascrip
Hi Yifan,
You could also try increasing the spark.kryoserializer.buffer.max.mb
*spark.kryoserializer.buffer.max.mb *(64 Mb by default) : useful if your
default buffer size goes further than 64 Mb;
Per doc:
Maximum allowable size of Kryo serialization buffer. This must be larger
than any object
>From tableau, you should be able to use the Initial SQL option to support
this:
So in Tableau add the following to the “Initial SQL”
create function myfunc AS 'myclass'
using jar 'hdfs:///path/to/jar';
HTH,
Todd
On Mon, Oct 19, 2015 at 11:22 AM, Deenar Toraskar
Hi Kali,
If you do not mind sending JSON, you could do something like this, using
json4s:
val rows = p.collect() map ( row => TestTable(row.getString(0),
row.getString(1)) )
val json = parse(write(rows))
producer.send(new KeyedMessage[String, String]("trade", writePretty(json)))
// or for
Stratio offers a CEP implementation based on Spark Streaming and the Siddhi
CEP engine. I have not used the below, but they may be of some value to
you:
http://stratio.github.io/streaming-cep-engine/
https://github.com/Stratio/streaming-cep-engine
HTH.
-Todd
On Sun, Sep 13, 2015 at 7:49 PM,
https://issues.apache.org/jira/browse/SPARK-8360?jql=project%20%3D%20SPARK%20AND%20text%20~%20Streaming
-Todd
On Thu, Sep 10, 2015 at 10:22 AM, Gurvinder Singh <
gurvinder.si...@uninett.no> wrote:
> On 09/10/2015 07:42 AM, Tathagata Das wrote:
> > Rewriting is necessary. You will have to
on a streaming app ?
Thanks again.
Daniel
On Thu, Aug 6, 2015 at 1:53 AM, Todd Nist tsind...@gmail.com wrote:
Hi Danniel,
It is possible to create an instance of the SparkSQL Thrift server,
however seems like this project is what you may be looking for:
https://github.com/Intel-bigdata/spark
They are covered here in the docs:
http://spark.apache.org/docs/1.4.1/api/scala/index.html#org.apache.spark.sql.functions$
On Thu, Aug 6, 2015 at 5:52 AM, Netwaver wanglong_...@163.com wrote:
Hi All,
I am using Spark 1.4.1, and I want to know how can I find the
complete function
Hi Danniel,
It is possible to create an instance of the SparkSQL Thrift server, however
seems like this project is what you may be looking for:
https://github.com/Intel-bigdata/spark-streamingsql
Not 100% sure of your use case is, but you can always convert the data into
DF then issue a query
There is one package available on the spark-packages site,
http://spark-packages.org/package/Stratio/RabbitMQ-Receiver
The source is here:
https://github.com/Stratio/RabbitMQ-Receiver
Not sure that meets your needs or not.
-Todd
On Mon, Jul 20, 2015 at 8:52 AM, Jeetendra Gangele
Did you take a look at the excellent write up by Yin Huai and Michael
Armbrust? It appears that rank is supported in the 1.4.x release.
https://databricks.com/blog/2015/07/15/introducing-window-functions-in-spark-sql.html
Snippet from above article for your convenience:
To answer the first
There are there connector packages listed on spark packages web site:
http://spark-packages.org/?q=hbase
HTH.
-Todd
On Wed, Jul 15, 2015 at 2:46 PM, Shushant Arora shushantaror...@gmail.com
wrote:
Hi
I have a requirement of writing in hbase table from Spark streaming app
after some
I would strongly encourage you to read the docs at, they are very useful in
getting up and running:
https://github.com/datastax/spark-cassandra-connector/blob/master/doc/0_quick_start.md
For your use case shown above, you will need to ensure that you include the
appropriate version of the
foreachRDD returns a unit:
def foreachRDD(foreachFunc: (RDD
https://spark.apache.org/docs/latest/api/scala/org/apache/spark/rdd/RDD.html
[T]) ⇒ Unit): Unit
Apply a function to each RDD in this DStream. This is an output operator,
so 'this' DStream will be registered as an output stream and
to be a limitation at this time.
-Todd
On Thu, Jul 2, 2015 at 4:13 PM, Mulugeta Mammo mulugeta.abe...@gmail.com
wrote:
thanks but my use case requires I specify different start and max heap
sizes. Looks like spark sets start and max sizes same value.
On Thu, Jul 2, 2015 at 1:08 PM, Todd Nist tsind
You should use:
spark.executor.memory
from the docs https://spark.apache.org/docs/latest/configuration.html:
spark.executor.memory512mAmount of memory to use per executor process, in
the same format as JVM memory strings (e.g.512m, 2g).
-Todd
On Thu, Jul 2, 2015 at 3:36 PM, Mulugeta Mammo
You can get HDP with at least 1.3.1 from Horton:
http://hortonworks.com/hadoop-tutorial/using-apache-spark-technical-preview-with-hdp-2-2/
for your convenience from the dos:
wget -nv
http://public-repo-1.hortonworks.com/HDP/centos6/2.x/updates/2.2.4.4/hdp.repo
-O /etc/yum.repos.d/HDP-TP.repo
Hi Proust,
Is it possible to see the query you are running and can you run EXPLAIN
EXTENDED to show the physical plan
for the query. To generate the plan you can do something like this from
$SPARK_HOME/bin/beeline:
0: jdbc:hive2://localhost:10001 explain extended select * from
YourTableHere;
It was released yesterday.
On Friday, June 12, 2015, ayan guha guha.a...@gmail.com wrote:
Hi
When is official spark 1.4 release date?
Best
Ayan
Hi Gaurav,
Seems like you could use a broadcast variable for this if I understand your
use case. Create it in the driver based on the CommandLineArguments and
then use it in the workers.
https://spark.apache.org/docs/latest/programming-guide.html#broadcast-variables
So something like:
There use to be a project, StreamSQL (
https://github.com/thunderain-project/StreamSQL), but it appears a bit
dated and I do not see it in the Spark repo, but may have missed it.
@TD Is this project still active?
I'm not sure what the status is but it may provide some insights on how to
achieve
://datastax-oss.atlassian.net/browse/SPARKC-98 is still open...
On Fri, May 22, 2015 at 6:15 PM, Todd Nist tsind...@gmail.com wrote:
I'm using the spark-cassandra-connector from DataStax in a spark
streaming job launched from my own driver. It is connecting a a standalone
cluster on my local box which
I'm using the spark-cassandra-connector from DataStax in a spark streaming
job launched from my own driver. It is connecting a a standalone cluster
on my local box which has two worker running.
This is Spark 1.3.1 and spark-cassandra-connector-1.3.0-SNAPSHOT. I have
added the following entry to
From the docs,
https://spark.apache.org/docs/latest/programming-guide.html#rdd-persistence:
Storage LevelMeaningMEMORY_ONLYStore RDD as deserialized Java objects in
the JVM. If the RDD does not fit in memory, some partitions will not be
cached and will be recomputed on the fly each time they're
I believe your looking for df.na.fill in scala, in pySpark Module it is
fillna (http://spark.apache.org/docs/latest/api/python/pyspark.sql.html)
from the docs:
df4.fillna({'age': 50, 'name': 'unknown'}).show()age height name10 80
Alice5 null Bob50 null Tom50 null unknown
On
You may want to look at this tooling for helping identify performance
issues and bottlenecks:
https://github.com/kayousterhout/trace-analysis
I believe this is slated to become part of the web ui in the 1.4 release,
in fact based on the status of the JIRA,
I believe what Dean Wampler was suggesting is to use the sqlContext not the
sparkContext (sc), which is where the createDataFrame function resides:
https://spark.apache.org/docs/1.3.1/api/scala/index.html#org.apache.spark.sql.SQLContext
HTH.
-Todd
On Wed, May 13, 2015 at 6:00 AM, SLiZn Liu
Have you tried to set the following?
spark.worker.cleanup.enabled=true
spark.worker.cleanup.appDataTtl=seconds”
On Thu, May 7, 2015 at 2:39 AM, Taeyun Kim taeyun@innowireless.com
wrote:
Hi,
After a spark program completes, there are 3 temporary directories remain
in the temp
Are you using Kryo or Java serialization? I found this post useful:
http://stackoverflow.com/questions/23962796/kryo-readobject-cause-nullpointerexception-with-arraylist
If using kryo, you need to register the classes with kryo, something like
this:
sc.registerKryoClasses(Array(
Hi,
I have a DataFrame that represents my data looks like this:
+-++
| col_name| data_type |
+-++
| obj_id | string |
| type| string |
| name
*Resending as I do not see that this made it to the mailing list, sorry if
in fact it did an is just nor reflected online yet.*
I’m very perplexed with the following. I have a set of AVRO generated
objects that are sent to a SparkStreaming job via Kafka. The SparkStreaming
job follows the
I’m very perplexed with the following. I have a set of AVRO generated
objects that are sent to a SparkStreaming job via Kafka. The SparkStreaming
job follows the receiver-based approach. I am encountering the below error
when I attempt to de serialize the payload:
15/04/30 17:49:25 INFO
Can you simply apply the
https://spark.apache.org/docs/1.3.1/api/scala/index.html#org.apache.spark.util.StatCounter
to this? You should be able to do something like this:
val stats = RDD.map(x = x._2).stats()
-Todd
On Tue, Apr 28, 2015 at 10:00 AM, subscripti...@prismalytics.io
I think docs are correct. If you follow the example from the docs and add
this import shown below, I believe you will get what your looking for:
// This is used to implicitly convert an RDD to a DataFrame.import
sqlContext.implicits._
You could also simply take your rdd and do the following:
down where
the dependency was coming from. Based on Patrick comments it sound like
this is now resolved.
Sorry for the confustion.
-Todd
On Wed, Apr 8, 2015 at 4:38 PM, Todd Nist tsind...@gmail.com wrote:
Hi Mohammed,
I think you just need to add -DskipTests to you build. Here is how I
built
To use the HiveThriftServer2.startWithContext, I thought one would use the
following artifact in the build:
org.apache.spark%% spark-hive-thriftserver % 1.3.0
But I am unable to resolve the artifact. I do not see it in maven central
or any other repo. Do I need to build Spark and
org.apache.spark#spark-network-shuffle_2.10;1.3.0 test
[error] Total time: 106 s, completed Apr 8, 2015 12:33:45 PM
Mohammed
*From:* Michael Armbrust [mailto:mich...@databricks.com]
*Sent:* Wednesday, April 8, 2015 11:54 AM
*To:* Mohammed Guller
*Cc:* Todd Nist; James Aley; user; Patrick
In 1.2.1 of I was persisting a set of parquet files as a table for use by
spark-sql cli later on. There was a post here
http://apache-spark-user-list.1001560.n3.nabble.com/persist-table-schema-in-spark-sql-tt16297.html#a16311
by
Mchael Armbrust that provide a nice little helper method for dealing
is download location ?
On Fri, Apr 3, 2015 at 3:42 PM, Todd Nist tsind...@gmail.com wrote:
Started the spark shell with the one jar from hive suggested:
./bin/spark-shell --master spark://radtech.io:7077 --total-executor-cores 2
--driver-class-path /usr/local/spark/lib/mysql-connector-java
definition (code) of UDF json_tuple. That should solve
your problem.
On Fri, Apr 3, 2015 at 3:57 PM, Todd Nist tsind...@gmail.com wrote:
I placed it there. It was downloaded from MySql site.
On Fri, Apr 3, 2015 at 6:25 AM, ÐΞ€ρ@Ҝ (๏̯͡๏) deepuj...@gmail.com
wrote:
Akhil
you mentioned /usr/local
Thanks
Best Regards
On Fri, Apr 3, 2015 at 2:55 PM, Todd Nist tsind...@gmail.com wrote:
Hi Akhil,
This is for version 1.2.1. Well the other thread that you reference was
me attempting it in 1.3.0 to see if the issue was related to 1.2.1. I did
not build Spark but used the version from
What version of Cassandra are you using? Are you using DSE or the stock
Apache Cassandra version? I have connected it with DSE, but have not
attempted it with the standard Apache Cassandra version.
FWIW,
in Tableau using the ODBC driver that comes with DSE. Once
you connect, Tableau allows to use C* keyspace as schema and column
families as tables.
Mohammed
*From:* pawan kumar [mailto:pkv...@gmail.com]
*Sent:* Friday, April 3, 2015 7:41 AM
*To:* Todd Nist
*Cc:* user@spark.apache.org; Mohammed
@Pawan
Not sure if you have seen this or not, but here is a good example by
Jonathan Lacefield of Datastax's on hooking up sparksql with DSE, adding
Tableau is as simple as Mohammed stated with DSE.
https://github.com/jlacefie/sparksqltest.
HTH,
Todd
On Fri, Apr 3, 2015 at 2:39 PM, Todd Nist
are in the remote node. I
am not sure if i need to install spark and its dependencies in the webui
(zepplene) node.
I am not sure talking about zepplelin in this thread is right.
Thanks once again for all the help.
Thanks,
Pawan Venugopal
On Fri, Apr 3, 2015 at 11:48 AM, Todd Nist tsind
CalliopeServer2, which works like a charm with BI tools that
use JDBC, but unfortunately Tableau throws an error when it connects to it.
Mohammed
*From:* Todd Nist [mailto:tsind...@gmail.com]
*Sent:* Friday, April 3, 2015 11:39 AM
*To:* pawan kumar
*Cc:* Mohammed Guller; user@spark.apache.org
Hi Young,
Sorry for the duplicate post, want to reply to all.
I just downloaded the bits prebuilt form apache spark download site.
Started the spark shell and got the same error.
I then started the shell as follows:
./bin/spark-shell --master spark://radtech.io:7077 --total-executor-cores 2
. If you want the
specific jar, you could look fr jackson or json serde in it.
Thanks
Best Regards
On Thu, Apr 2, 2015 at 12:49 AM, Todd Nist tsind...@gmail.com wrote:
I have a feeling I’m missing a Jar that provides the support or could
this may be related to https://issues.apache.org/jira
I was trying a simple test from the spark-shell to see if 1.3.0 would
address a problem I was having with locating the json_tuple class and got
the following error:
scala import org.apache.spark.sql.hive._
import org.apache.spark.sql.hive._
scala val sqlContext = new HiveContext(sc)sqlContext:
I am accessing ElasticSearch via the elasticsearch-hadoop and attempting to
expose it via SparkSQL. I am using spark 1.2.1, latest supported by
elasticsearch-hadoop, and org.elasticsearch % elasticsearch-hadoop %
2.1.0.BUILD-SNAPSHOT of elasticsearch-hadoop. I’m
encountering an issue when I
Here are a few ways to achieve what your loolking to do:
https://github.com/cjnolet/spark-jetty-server
Spark Job Server - https://github.com/spark-jobserver/spark-jobserver -
defines a REST API for Spark
Hue -
at 3:26 PM, Todd Nist tsind...@gmail.com wrote:
I am accessing ElasticSearch via the elasticsearch-hadoop and attempting
to expose it via SparkSQL. I am using spark 1.2.1, latest supported by
elasticsearch-hadoop, and org.elasticsearch % elasticsearch-hadoop %
2.1.0.BUILD-SNAPSHOT of elasticsearch
Perhaps this project, https://github.com/calrissian/spark-jetty-server,
could help with your requirements.
On Tue, Mar 24, 2015 at 7:12 AM, Jeffrey Jedele jeffrey.jed...@gmail.com
wrote:
I don't think there's are general approach to that - the usecases are just
to different. If you really need
:
Seems the elasticsearch-hadoop project was built with an old version of
Spark, and then you upgraded the Spark version in execution env, as I know
the StructField changed the definition in Spark 1.2, can you confirm the
version problem first?
*From:* Todd Nist [mailto:tsind...@gmail.com]
*Sent
1 - 100 of 134 matches
Mail list logo