groups.google.com/forum/#!forum/shark-users
Mayur Rustagi
Ph: +1 (760) 203 3257
http://www.sigmoidanalytics.com
@mayur_rustagi https://twitter.com/mayur_rustagi
On Thu, Mar 6, 2014 at 8:08 PM, qingyang li liqingyang1...@gmail.comwrote:
Hi, Yana, do you know if there is mailing list for shark
addtion :
1. i have run LOAD DATA INPATH '/user/root/input/test.txt' into table b; in
shark. i think this will create rdd in memery, right?
2. when i run free -g , the result show somethings has been stored into
memery. the file is almost 4g.
[root@bigdata001
Hi Patrick,
Thanks for your reply.
I am guessing even an array type will be registered automatically. Is this
correct?
Thanks,
Pradeep
--
View this message in context:
http://apache-spark-user-list.1001560.n3.nabble.com/Kryo-serialization-does-not-compress-tp2042p2400.html
Sent from the
Hi,
We are currently trying to migrate to hadoop 2.2.0 and hence we have
installed spark 0.9.0 and the pre-release version of shark 0.9.0.
When we execute the script ( script.txt
http://apache-spark-user-list.1001560.n3.nabble.com/file/n2401/script.txt
) we get the following error.
Hi,
There is also an option to run spark applications on top of mesos in fine
grained mode, then it is possible for fair scheduling (applications will run
in parallel and mesos is responsible for scheduling all tasks) so in a sense
all applications will progress in parallel, obviously it total in
What is wrong with this code?
A condensed set of this code works in the spark-shell.
It does not work when deployed via a jar.
def
calcSimpleRetention(start:String,end:String,event1:String,event2:String):List[Double]
= {
val spd = new PipelineDate(start)
val epd = new
Strike that. Figured it out. Don't you just hate it when you fire off an
email and you figure it out as it is being sent? ;)
Ognen
On 3/7/14, 12:41 PM, Ognen Duzlevski wrote:
What is wrong with this code?
A condensed set of this code works in the spark-shell.
It does not work when deployed
the issue was with print?
printing on worker?
Mayur Rustagi
Ph: +1 (760) 203 3257
http://www.sigmoidanalytics.com
@mayur_rustagi https://twitter.com/mayur_rustagi
On Fri, Mar 7, 2014 at 10:43 AM, Ognen Duzlevski
og...@plainvanillagames.com wrote:
Strike that. Figured it out. Don't you just
Set them as environment variable at boot configure both stacks to call on
that..
Mayur Rustagi
Ph: +1 (760) 203 3257
http://www.sigmoidanalytics.com
@mayur_rustagi https://twitter.com/mayur_rustagi
On Fri, Mar 7, 2014 at 9:32 AM, Nicholas Chammas nicholas.cham...@gmail.com
wrote:
On
No.
It was a logical error.
val ev1rdd = f.filter(_.split(,)(0).split(:)(1).replace(\,) ==
event1).map(line =
(line.split(,)(2).split(:)(1).replace(\,),1)).cache should have
mapped to ,0, not ,1
I have had the most awful time figuring out these looped things. It
seems like it is next to
Mostly the job you are executing is not serializable, this typically
happens when you have a library that is not serializable.. are you using
any library like jodatime etc ?
Mayur Rustagi
Ph: +1 (760) 203 3257
http://www.sigmoidanalytics.com
@mayur_rustagi https://twitter.com/mayur_rustagi
On
Easiest is to use a queue, Kafka for example. So push your json request
string into kafka,
connect spark streaming to kafka pull data from it execute it.
Spark streaming will split up the jobs pipeline the data.
Mayur Rustagi
Ph: +1 (760) 203 3257
http://www.sigmoidanalytics.com
@mayur_rustagi
Hi Spark users,
could someone help me out.
My company has a fully functioning spark cluster with shark running on
top of it (as part of the same cluster, on the same LAN) . I'm
interested in running raw spark code against it but am running against
the following issue -- it seems like the machine
FWIW - I posted some notes to help people get started quickly with Spark on
C*.
http://brianoneill.blogspot.com/2014/03/spark-on-cassandra-w-calliope.html
(tnx again to Rohit and team for all of their help)
-brian
--
Brian ONeill
CTO, Health Market Science (http://healthmarketscience.com)
Nice, thanks :)
Ognen
On 3/7/14, 2:48 PM, Brian O'Neill wrote:
FWIW - I posted some notes to help people get started quickly with
Spark on C*.
http://brianoneill.blogspot.com/2014/03/spark-on-cassandra-w-calliope.html
(tnx again to Rohit and team for all of their help)
-brian
--
Brian
Mayur, have not thought of that. Yes, I use jodatime. What is the scope
that this serialization issue applies to? Only the method making a call
into / using such a library? The whole class the method using such a
library belongs to? Sorry if it is a dumb question :)
Ognen
On 3/7/14, 1:29 PM,
Mayur,
So looking at the section on environment variables
herehttp://spark.incubator.apache.org/docs/latest/configuration.html#environment-variables,
are you saying to set these options via SPARK_JAVA_OPTS -D? On a related
note, in looking around I just discovered this command line tool for
Hi,
I'm trying to run a kafka-stream and get a strange exception. The
streaming is created by following code:
val lines = KafkaUtils.createStream[String, VtrRecord,
StringDecoder, VtrRecordDeserializer](ssc, kafkaParams.toMap,
topicpMap, StorageLevel.MEMORY_AND_DISK_SER_2)
'VtrRecord'
So the whole function closure you want to apply on your RDD needs to be
serializable so that it can be serialized sent to workers to operate on
RDD. So objects of jodatime cannot be serialized sent hence jodatime is
out of work. 2 bad answers
1. initialize jodatime for each row complete work
The driver contains the DAG scheduler which manages stages of jobs needs
to talk back forth with workers. So you can run Driver on any machine
that can reach master drivers(even your laptop). But Driver will need to
be reachable to all machines.
I think 0.9.0 added an ability for the driver to
Hi,
I'm trying to run a kafka-stream and get a strange exception. The
streaming is created by following code:
val lines = KafkaUtils.createStream[String, VtrRecord,
StringDecoder, VtrRecordDeserializer](ssc, kafkaParams.toMap,
topicpMap, StorageLevel.MEMORY_AND_DISK_SER_2)
'VtrRecord'
I am not sure how to debug this without any more information about the
source. Can you monitor on the receiver side that data is being accepted by
the receiver but not reported?
TD
On Wed, Mar 5, 2014 at 7:23 AM, eduardocalfaia e.costaalf...@unibs.itwrote:
Hi TD,
I have seen in the web UI
There is #3 which is use mapPartitions and init one jodatime obj per partition,
which is less overhead for large objects—
Sent from Mailbox for iPhone
On Sat, Mar 8, 2014 at 2:54 AM, Mayur Rustagi mayur.rust...@gmail.com
wrote:
So the whole function closure you want to apply on your RDD needs
23 matches
Mail list logo