Re: passing SparkContext as parameter

2015-09-21 Thread Ted Yu
You can use broadcast variable for passing connection information. Cheers > On Sep 21, 2015, at 4:27 AM, Priya Ch wrote: > > can i use this sparkContext on executors ?? > In my application, i have scenario of reading from db for certain records in > rdd. Hence I

Re: Spark does not yet support its JDBC component for Scala 2.11.

2015-09-21 Thread Ted Yu
build instruction for 2.11 is obsolete? Or there are still >> some limitations? >> >> >> http://spark.apache.org/docs/latest/building-spark.html#building-for-scala-211 >> >> On Fri, Sep 11, 2015 at 2:09 PM, Ted Yu <yuzhih...@gmail.com> wrote: >> >&

Re: Troubleshooting "Task not serializable" in Spark/Scala environments

2015-09-21 Thread Ted Yu
Which release are you using ? >From the line number in ClosureCleaner, it seems you're using 1.4.x Cheers On Mon, Sep 21, 2015 at 4:07 PM, Balaji Vijayan wrote: > Howdy, > > I'm a relative novice at Spark/Scala and I'm puzzled by some behavior that > I'm seeing in

Re: Exception initializing JavaSparkContext

2015-09-21 Thread Ted Yu
bq. hadoop-core-0.20.204.0 How come the above got into play - it was from hadoop-1 On Mon, Sep 21, 2015 at 11:34 AM, Ellen Kraffmiller < ellen.kraffmil...@gmail.com> wrote: > I am including the Spark core dependency in my maven pom.xml: > > > org.apache.spark > spark-core_2.10 > 1.5.0 > > >

Re: Problem at sbt/sbt assembly

2015-09-20 Thread Ted Yu
Have you seen this thread: http://search-hadoop.com/m/q3RTtVJJ3I15OJ251 Cheers On Sun, Sep 20, 2015 at 6:11 PM, Aaroncq4 <475715...@qq.com> wrote: > When I used “sbt/sbt assembly" to compile spark code of spark-1.5.0,I got a > problem and I did not know why.It signs that: > > NOTE: The sbt/sbt

Re: question building spark in a virtual machine

2015-09-19 Thread Ted Yu
Can you tell us how you configured the JVM heap size ? Which version of Java are you using ? When I build Spark, I do the following: export MAVEN_OPTS="-Xmx2g -XX:MaxPermSize=512M -XX:ReservedCodeCacheSize=512m" Cheers On Sat, Sep 19, 2015 at 5:31 AM, Eyal Altshuler

Re: question building spark in a virtual machine

2015-09-19 Thread Ted Yu
gt;> wrote. >> My java version is 1.7.0_75. >> I didn't customized the JVM heap size specifically. Is there an >> additional configuration I have to run besides the MAVEN_OPTS configutaion? >> >> Thanks, >> Eyal >> >> On Sat, Sep 19, 2015 at 5:29 PM

Re: What's the best practice to parse JSON using spark

2015-09-18 Thread Ted Yu
For #2, please see: examples/src/main/scala//org/apache/spark/examples/HBaseTest.scala examples/src/main/scala//org/apache/spark/examples/pythonconverters/HBaseConverters.scala In hbase, there is hbase-spark module which is being polished. Should be available in hbase 1.3.0 release. Cheers On

Re: What's the best practice to parse JSON using spark

2015-09-18 Thread Ted Yu
For #1, see this thread: http://search-hadoop.com/m/q3RTti0Thneenne2 For #2, also see: examples//src/main/python/hbase_inputformat.py examples//src/main/python/hbase_outputformat.py Cheers On Fri, Sep 18, 2015 at 5:12 PM, Ted Yu <yuzhih...@gmail.com> wrote: > For #2, please see: >

Re: Spark Streaming checkpoint recovery throws Stack Overflow Error

2015-09-18 Thread Ted Yu
Which version of Java are you using ? And release of Spark, please. Thanks On Fri, Sep 18, 2015 at 9:15 AM, swetha wrote: > Hi, > > When I try to recover my Spark Streaming job from a checkpoint directory, I > get a StackOverFlow Error as shown below. Any idea as to

Re: WAL on S3

2015-09-17 Thread Ted Yu
I assume you don't use Kinesis. Are you running Spark 1.5.0 ? If you must use S3, is switching to Kinesis possible ? Cheers On Thu, Sep 17, 2015 at 1:09 PM, Michal Čizmazia wrote: > How to make Write Ahead Logs to work with S3? Any pointers welcome! > > It seems as a known

Re: Suggested Method for Execution of Periodic Actions

2015-09-16 Thread Ted Yu
bq. and check if 5 minutes have passed What if the duration for the window is longer than 5 minutes ? Cheers On Wed, Sep 16, 2015 at 1:25 PM, Adrian Tanase wrote: > If you don't need the counts in betweem the DB writes, you could simply > use a 5 min window for the

Re: Iterating over JavaRDD

2015-09-16 Thread Ted Yu
How about using this method: * Return a new RDD by applying a function to all elements of this RDD. */ def mapToDouble[R](f: DoubleFunction[T]): JavaDoubleRDD = { new JavaDoubleRDD(rdd.map(x => f.call(x).doubleValue())) On Wed, Sep 16, 2015 at 8:30 PM, Tapan Sharma

Re: Table is modified by DataFrameWriter

2015-09-16 Thread Ted Yu
Can you tell us which release you were using ? Thanks > On Sep 16, 2015, at 7:11 PM, "guoqing0...@yahoo.com.hk" > wrote: > > Hi all, > I found the table structure was modified when use DataFrameWriter.jdbc to > save the content of DataFrame , > >

Re: Dynamic Workflow Execution using Spark

2015-09-15 Thread Ted Yu
See this thread: http://search-hadoop.com/m/q3RTtUz0cyiPjYX On Tue, Sep 15, 2015 at 1:19 PM, Ashish Soni wrote: > Hi All , > > Are there any framework which can be used to execute workflows with in > spark or Is it possible to use ML Pipeline for workflow execution but

Re: Spark job failed

2015-09-14 Thread Ted Yu
Have you considered posting on vendor forum ? FYI On Mon, Sep 14, 2015 at 6:09 AM, Renu Yadav wrote: > > -- Forwarded message -- > From: Renu Yadav > Date: Mon, Sep 14, 2015 at 4:51 PM > Subject: Spark job failed > To:

Re: Stopping SparkContext and HiveContext

2015-09-13 Thread Ted Yu
For #1, there is the following method: @DeveloperApi def getExecutorStorageStatus: Array[StorageStatus] = { assertNotStopped() You can wrap the call in try block catching IllegalStateException. Of course, this is just a workaround. FYI On Sun, Sep 13, 2015 at 1:48 AM, Ophir Cohen

Re: Data lost in spark streaming

2015-09-13 Thread Ted Yu
Can you retrieve log for appattempt_1440495451668_0258_01 and see if there is some clue there ? Cheers On Sun, Sep 13, 2015 at 3:28 AM, Bin Wang wrote: > There is some error logs in the executor and I don't know if it is related: > > 15/09/11 10:54:05 WARN ipc.Client:

Re: Stopping SparkContext and HiveContext

2015-09-13 Thread Ted Yu
Please also see this thread: http://search-hadoop.com/m/q3RTtGpLeLyv97B1 On Sun, Sep 13, 2015 at 9:49 AM, Ted Yu <yuzhih...@gmail.com> wrote: > For #1, there is the following method: > > @DeveloperApi > def getExecutorStorageStatus: Array[StorageStatus] = { > asser

Re: Is there any Spark SQL reference manual?

2015-09-11 Thread Ted Yu
You may have seen this: https://spark.apache.org/docs/latest/sql-programming-guide.html Please suggest what should be added. Cheers On Fri, Sep 11, 2015 at 3:43 AM, vivek bhaskar wrote: > Hi all, > > I am looking for a reference manual for Spark SQL some thing like many >

Re: Spark does not yet support its JDBC component for Scala 2.11.

2015-09-11 Thread Ted Yu
Have you looked at: https://issues.apache.org/jira/browse/SPARK-8013 > On Sep 11, 2015, at 4:53 AM, Petr Novak wrote: > > Does it still apply for 1.5.0? > > What actual limitation does it mean when I switch to 2.11? No JDBC > Thriftserver? No JDBC DataSource? No

Re: Spark 1.5.0 java.lang.OutOfMemoryError: PermGen space

2015-09-11 Thread Ted Yu
Have you seen this thread ? http://search-hadoop.com/m/q3RTtPPuSvBu0rj2 > On Sep 11, 2015, at 3:00 AM, Jagat Singh wrote: > > Hi, > > We have queries which were running fine on 1.4.1 system. > > We are testing upgrade and even simple query like > val t1=

Re: countApproxDistinctByKey in python

2015-09-11 Thread Ted Yu
It has not been ported yet. On Fri, Sep 11, 2015 at 4:13 PM, LucaMartinetti wrote: > Hi, > > I am trying to use countApproxDistinctByKey in pyspark but cannot find it. > > >

Re: Exception Handling : Spark Streaming

2015-09-11 Thread Ted Yu
Was your intention that exception from rdd.saveToCassandra() be caught ? In that case you can place try / catch around that call. Cheers On Fri, Sep 11, 2015 at 7:30 AM, Samya wrote: > Hi Team, > > I am facing this issue where in I can't figure out why the exception is

Re: Is there any Spark SQL reference manual?

2015-09-11 Thread Ted Yu
tor.syntactical.StandardTokenParsers. > > Thanks, > -Rick > > vivekw...@gmail.com wrote on 09/11/2015 05:05:47 AM: > > > From: vivek bhaskar <vivekw...@gmail.com> > > To: Ted Yu <yuzhih...@gmail.com> > > Cc: user <user@spark.apache.org> > >

Re: How to enable Tungsten in Spark 1.5 for Spark SQL?

2015-09-10 Thread Ted Yu
Please see the following in sql/core/src/main/scala/org/apache/spark/sql/SQLConf.scala : val TUNGSTEN_ENABLED = booleanConf("spark.sql.tungsten.enabled", defaultValue = Some(true), doc = "When true, use the optimized Tungsten physical execution backend which explicitly " +

Re: ArrayIndexOutOfBoundsException when using repartitionAndSortWithinPartitions()

2015-09-10 Thread Ted Yu
if (o1.hits == o2.hits) { > return 0; > } else if (o1.hits > o2.hits) { > return -1; > } else { > return 1; > } > } > > } > > ... > > > > Thanks, > Ashish > > On Wed, Sep 9, 2015 at 5:13 PM, Ted Yu <yuzhih...@gm

Re: ArrayIndexOutOfBoundsException when using repartitionAndSortWithinPartitions()

2015-09-10 Thread Ted Yu
f partitions. Fixed that. > > Thanks, > Ashish > > On Thu, Sep 10, 2015 at 10:44 AM, Ted Yu <yuzhih...@gmail.com> wrote: > >> Here is snippet of ExternalSorter.scala where ArrayIndexOutOfBoundsException >> was thrown: >> >> while (iterator.

Re: java.lang.NoSuchMethodError and yarn-client mode

2015-09-09 Thread Ted Yu
Have you checked the contents of __app__.jar ? > On Sep 9, 2015, at 3:28 AM, Tom Seddon wrote: > > Thanks for your reply Aniket. > > Ok I've done this and I'm still confused. Output from running locally shows: > >

Re: I am very new to Spark. I have a very basic question. I have an array of values: listofECtokens: Array[String] = Array(EC-17A5206955089011B, EC-17A5206955089011A) I want to filter an RDD for all o

2015-09-09 Thread Ted Yu
Prachicsa: If the number of EC tokens is high, please consider using a set instead of array for better lookup performance. BTW use short, descriptive subject for future emails. > On Sep 9, 2015, at 3:13 AM, Akhil Das wrote: > > Try this: > > val tocks =

Re: Loading json data into Pair RDD in Spark using java

2015-09-09 Thread Ted Yu
Please take a look at the example in SPARK-10287 FYI On Wed, Sep 9, 2015 at 8:50 AM, prachicsa wrote: > > > I am very new to Spark. > > I have a very basic question. I read a file in Spark RDD in which each line > is a JSON. I want to make apply groupBy like

Re: ArrayIndexOutOfBoundsException when using repartitionAndSortWithinPartitions()

2015-09-09 Thread Ted Yu
Which release of Spark are you using ? Can you show skeleton of your partitioner and comparator ? Thanks > On Sep 9, 2015, at 4:45 PM, Ashish Shenoy wrote: > > Hi, > > I am trying to sort a RDD pair using repartitionAndSortWithinPartitions() for > my key [which

Re: build on spark 1.5.0 error with Execution scala-compile-first of goal & Compile failed via zinc server

2015-09-09 Thread Ted Yu
I used your first command with mvn 3.3.3 (without build/) The build passed. FYI On Wed, Sep 9, 2015 at 8:50 PM, stark_summer wrote: > codeurl: http://d3kbcqa49mib13.cloudfront.net/spark-1.5.0.tgz > build scripts: > > build/mvn -Phadoop-2.3 -Dhadoop.version=2.3.0-cdh5.1.0

Re: Failed when starting Spark 1.5.0 standalone cluster

2015-09-09 Thread Ted Yu
See the following announcement: http://search-hadoop.com/m/q3RTtojAyW1dabFk On Wed, Sep 9, 2015 at 9:05 PM, Netwaver wrote: > Hi Spark experts, > I am trying to migrate my Spark cluster from > 1.4.1 to latest 1.5.0 , but meet below issues when run

Re: Filtering an rdd depending upon a list of values in Spark

2015-09-09 Thread Ted Yu
Take a look at the following methods: * Filters rows using the given condition. * {{{ * // The following are equivalent: * peopleDf.filter($"age" > 15) * peopleDf.where($"age" > 15) * }}} * @group dfops * @since 1.3.0 */ def filter(condition: Column): DataFrame

Re: performance when checking if data frame is empty or not

2015-09-09 Thread Ted Yu
Have you tried: df.rdd.isEmpty Cheers On Tue, Sep 8, 2015 at 1:22 PM, Axel Dahl wrote: > I have a join, that fails when one of the data frames is empty. > > To avoid this I am hoping to check if the dataframe is empty or not before > the join. > > The question is

Re: Java vs. Scala for Spark

2015-09-08 Thread Ted Yu
Performance wise, Scala is by far the best choice when you use Spark. The cost of learning Scala is not negligible but not insurmountable either. My personal opinion. On Tue, Sep 8, 2015 at 6:50 AM, Bryan Jeffrey wrote: > All, > > We're looking at language choice in

Re: Java vs. Scala for Spark

2015-09-08 Thread Ted Yu
/scalac are huge > resource hogs, since so much of Scala is really implemented in the > compiler; prepare to update your laptop to develop in Scala on your > IDE of choice, and start to think about running long-running compile > servers like we did in the year 2000. > > Still net

Re: 1.5 Build Errors

2015-09-08 Thread Ted Yu
Do you run Zinc while compiling ? Cheers On Tue, Sep 8, 2015 at 7:56 AM, Benjamin Zaitlen wrote: > I'm still getting errors with 3g. I've increase to 4g and I'll report back > > To be clear: > > export MAVEN_OPTS="-Xmx4g -XX:MaxPermSize=1024M >

Re: SparkContext initialization error- java.io.IOException: No space left on device

2015-09-06 Thread Ted Yu
Use the following command if needed: df -i /tmp See https://wiki.gentoo.org/wiki/Knowledge_Base:No_space_left_on_device_while_there_is_plenty_of_space_available On Sun, Sep 6, 2015 at 6:15 AM, Shixiong Zhu wrote: > The folder is in "/tmp" by default. Could you use "df -h" to

Re: Small File to HDFS

2015-09-04 Thread Ted Yu
What about concurrent access (read / update) to the small file with same key ? That can get a bit tricky. On Thu, Sep 3, 2015 at 2:47 PM, Jörn Franke wrote: > Well it is the same as in normal hdfs, delete file and put a new one with > the same name works. > > Le jeu. 3

Re: Small File to HDFS

2015-09-04 Thread Ted Yu
need NOSQL like random update access. >> >> >> >> >> >> On Fri, Sep 4, 2015 at 9:56 AM, Ted Yu <yuzhih...@gmail.com> wrote: >> >>> What about concurrent access (read / update) to the small file with same >>> key ? >>>

Re: Small File to HDFS

2015-09-03 Thread Ted Yu
possible to easily process Pig on it >> directly ? >> >> Tks >> Nicolas >> >> - Mail original - >> De: "Tao Lu" <taolu2...@gmail.com> >> À: nib...@free.fr >> Cc: "Ted Yu" <yuzhih...@gmail.com>, "

Re: Hbase Lookup

2015-09-03 Thread Ted Yu
Ayan: Please read this: http://hbase.apache.org/book.html#cp Cheers On Thu, Sep 3, 2015 at 2:13 PM, ayan guha wrote: > Hi > > Thanks for your comments. My driving point is instead of loading Hbase > data entirely I want to process record by record lookup and that is best >

Re: Small File to HDFS

2015-09-02 Thread Ted Yu
Instead of storing those messages in HDFS, have you considered storing them in key-value store (e.g. hbase) ? Cheers On Wed, Sep 2, 2015 at 9:07 AM, wrote: > Hello, > I'am currently using Spark Streaming to collect small messages (events) , > size being <50 KB , volume is high

Re: Save dataframe into hbase

2015-09-02 Thread Ted Yu
The following JIRA is close to integration: HBASE-14181 Add Spark DataFrame DataSource to HBase-Spark Module after which hbase would provide better support for DataFrame interaction. On Wed, Sep 2, 2015 at 1:21 PM, ALEX K wrote: > you can use Phoenix-Spark plugin: >

Re: Conditionally do things different on the first minibatch vs subsequent minibatches in a dstream

2015-09-01 Thread Ted Yu
Can you utilize the following method in StreamingListener ? override def onBatchStarted(batchStarted: StreamingListenerBatchStarted) { Cheers On Tue, Sep 1, 2015 at 12:36 PM, steve_ash wrote: > We have some logic that we need to apply while we are processing the events

Re: Spark shell and StackOverFlowError

2015-08-31 Thread Ted Yu
>> JIRA, but arguably it's something that's nice to just work but isn't >> >> to do with Spark per se. Or, have a look at others related to the >> >> closure and shell and you may find this is related to other known >> >> behavior. >> >

Re: Spark shell and StackOverFlowError

2015-08-31 Thread Ted Yu
a:1183) > > at > java.io.ObjectOutputStream.defaultWriteFields(ObjectOutputStream.java:1547) > > at java.io.ObjectOutputStream.writeSerialData(ObjectOutputStream.java:1508) > > at > java.io.ObjectOutputStream.writeOrdinaryObject(ObjectOutputStream.java:1431) > > at j

Re: Spark executor OOM issue on YARN

2015-08-31 Thread Ted Yu
Please see this thread w.r.t. spark.sql.shuffle.partitions : http://search-hadoop.com/m/q3RTtE7JOv1bDJtY FYI On Mon, Aug 31, 2015 at 11:03 AM, unk1102 wrote: > Hi I have Spark job and its executors hits OOM issue after some time and my > job hangs because of it followed

Re: How to send RDD result to REST API?

2015-08-31 Thread Ted Yu
> On Fri, Aug 28, 2015 at 9:45 PM, Ted Yu <yuzhih...@gmail.com> wrote: > >> What format does your REST server expect ? >> >> You may have seen this: >> >> https://www.paypal-engineering.com/2014/02/13/hello-newman-a-rest-client-for-scala/ >> >&

Re: Spark shell and StackOverFlowError

2015-08-31 Thread Ted Yu
I used the notation on JIRA where bq means quote. FYI On Mon, Aug 31, 2015 at 12:34 PM, Ashish Shrowty <ashish.shro...@gmail.com> wrote: > Yes .. I am closing the stream. > > Not sure what you meant by "bq. and then create rdd"? > > -Ashish > > On Mon, A

Re: Write Concern used in Mongo-Hadoop Connector

2015-08-31 Thread Ted Yu
Take a look at the following: https://github.com/mongodb/mongo-hadoop/blob/master/core/src/main/java/com/mongodb/hadoop/MongoOutputFormat.java https://jira.mongodb.org/plugins/servlet/mobile#issue/HADOOP-82 > On Aug 31, 2015, at 4:39 AM, Deepesh Maheshwari >

Re: spark-submit issue

2015-08-30 Thread Ted Yu
Pranay: Please take a look at the Redirector class inside: ./launcher/src/test/java/org/apache/spark/launcher/SparkLauncherSuite.java Cheers On Sun, Aug 30, 2015 at 11:25 AM, Pranay Tonpay pranay.ton...@impetus.co.in wrote: yes, the context is being closed at the end.

Re: Spark shell and StackOverFlowError

2015-08-30 Thread Ted Yu
Using Spark shell : scala import scala.collection.mutable.MutableList import scala.collection.mutable.MutableList scala val lst = MutableList[(String,String,Double)]() lst: scala.collection.mutable.MutableList[(String, String, Double)] = MutableList() scala

Re: Spark shell and StackOverFlowError

2015-08-30 Thread Ted Yu
:55 AM Ted Yu yuzhih...@gmail.com wrote: Using Spark shell : scala import scala.collection.mutable.MutableList import scala.collection.mutable.MutableList scala val lst = MutableList[(String,String,Double)]() lst: scala.collection.mutable.MutableList[(String, String, Double)] = MutableList

Re: Spark Version upgrade isue:Exception in thread main java.lang.NoSuchMethodError

2015-08-30 Thread Ted Yu
Manohar: See if adding the following dependency to your project helps: dependency +groupIdcom.fasterxml.jackson.core/groupId +artifactIdjackson-databind/artifactId +version${fasterxml.jackson.version}/version + /dependency + dependency +

Re: submit_spark_job_to_YARN

2015-08-30 Thread Ted Yu
This is related: SPARK-10288 Add a rest client for Spark on Yarn FYI On Sun, Aug 30, 2015 at 12:12 PM, Dawid Wysakowicz wysakowicz.da...@gmail.com wrote: Hi Ajay, In short story: No, there is no easy way to do that. But if you'd like to play around this topic a good starting point would be

Re: Where is Redgate's HDFS explorer?

2015-08-29 Thread Ted Yu
See https://hadoop.apache.org/docs/r2.7.0/hadoop-project-dist/hadoop-hdfs/HdfsNfsGateway.html FYI On Sat, Aug 29, 2015 at 1:04 AM, Akhil Das ak...@sigmoidanalytics.com wrote: You can also mount HDFS through the NFS gateway and access i think. Thanks Best Regards On Tue, Aug 25, 2015 at

Re: How to send RDD result to REST API?

2015-08-28 Thread Ted Yu
What format does your REST server expect ? You may have seen this: https://www.paypal-engineering.com/2014/02/13/hello-newman-a-rest-client-for-scala/ On Fri, Aug 28, 2015 at 9:35 PM, Cassa L lcas...@gmail.com wrote: Hi, If I have RDD that counts something e.g.: JavaPairDStreamString,

Re: how to register CompactBuffer in Kryo

2015-08-28 Thread Ted Yu
For the exception w.r.t. ManifestFactory , there is SPARK-6497 which is Open. FYI On Fri, Aug 28, 2015 at 8:25 AM, donhoff_h 165612...@qq.com wrote: Hi, all I wrote a spark program which uses the Kryo serialization. When I count a rdd which type is RDD[(String,String)], it reported an

Re: Alternative to Large Broadcast Variables

2015-08-28 Thread Ted Yu
+1 on Jason's suggestion. bq. this large variable is broadcast many times during the lifetime Please consider making this large variable more granular. Meaning, reduce the amount of data transferred between the key value store and your app during update. Cheers On Fri, Aug 28, 2015 at 12:44

Re: Building spark-examples takes too much time using Maven

2015-08-26 Thread Ted Yu
Can you provide a bit more information ? Are Spark artifacts packaged by you have the same names / paths (in maven repo) as the ones published by Apache Spark ? Is Zinc running on the machine where you performed the build ? Cheers On Wed, Aug 26, 2015 at 7:56 AM, Muhammad Haseeb Javed

Re: How to increase data scale in Spark SQL Perf

2015-08-26 Thread Ted Yu
),then eventually I will see OutOfMemory occur Can you guys try to run it if you have the environment ? I think you may reproduce it. Thanks! At 2015-08-26 13:01:34, Ted Yu yuzhih...@gmail.com wrote: The error in #1 below was not informative. Are you able to get more detailed error message

Re: Issue with building Spark v1.4.1-rc4 with Scala 2.11

2015-08-26 Thread Ted Yu
Have you run dev/change-version-to-2.11.sh ? Cheers On Wed, Aug 26, 2015 at 7:07 AM, Felix Neutatz neut...@googlemail.com wrote: Hi everybody, I tried to build Spark v1.4.1-rc4 with Scala 2.11: ../apache-maven-3.3.3/bin/mvn -Dscala-2.11 -DskipTests clean install Before running this, I

Re: Spark-Ec2 launch failed on starting httpd spark 141

2015-08-25 Thread Ted Yu
Looks like it is this PR: https://github.com/mesos/spark-ec2/pull/133 On Tue, Aug 25, 2015 at 9:52 AM, Shivaram Venkataraman shiva...@eecs.berkeley.edu wrote: Yeah thats a know issue and we have a PR out to fix it. Shivaram On Tue, Aug 25, 2015 at 7:39 AM, Garry Chen g...@cornell.edu

Re: Spark-Ec2 launch failed on starting httpd spark 141

2015-08-25 Thread Ted Yu
Corrected a typo in the subject of your email. What you cited seems to be from worker node startup. Was there other error you saw ? Please list the command you used. Cheers On Tue, Aug 25, 2015 at 7:39 AM, Garry Chen g...@cornell.edu wrote: Hi All, I am trying to lunch a

Re: How to increase data scale in Spark SQL Perf

2015-08-25 Thread Ted Yu
The error in #1 below was not informative. Are you able to get more detailed error message ? Thanks On Aug 25, 2015, at 6:57 PM, Todd bit1...@163.com wrote: Thanks Ted Yu. Following are the error message: 1. The exception that is shown on the UI is : Exception in thread Thread-113

Re: How to increase data scale in Spark SQL Perf

2015-08-25 Thread Ted Yu
Looks like you were attaching images to your email which didn't go through. Consider using third party site for images - or paste error in text. Cheers On Tue, Aug 25, 2015 at 4:22 AM, Todd bit1...@163.com wrote: Hi, The spark sql perf itself contains benchmark data generation. I am using

Re: Protobuf error when streaming from Kafka

2015-08-24 Thread Ted Yu
Can you show the complete stack trace ? Which Spark / Kafka release are you using ? Thanks On Mon, Aug 24, 2015 at 4:58 PM, Cassa L lcas...@gmail.com wrote: Hi, I am storing messages in Kafka using protobuf and reading them into Spark. I upgraded protobuf version from 2.4.1 to 2.5.0. I got

Re: Error when saving a dataframe as ORC file

2015-08-23 Thread Ted Yu
sc.hadoopConfiguration.set(fs.s3n.awsAccessKeyId, ***) sc.hadoopConfiguration.set(fs.s3n.awsSecretAccessKey, **) However, the error still occurs for ORC format. If I change the format to JSON, although the error does not go, the JSON files can be saved successfully. On Sun, Aug 23, 2015 at 5:51 AM, Ted Yu

Re: Error when saving a dataframe as ORC file

2015-08-23 Thread Ted Yu
You may have seen this: http://search-hadoop.com/m/q3RTtdSyM52urAyI On Aug 23, 2015, at 1:01 AM, lostrain A donotlikeworkingh...@gmail.com wrote: Hi, I'm trying to save a simple dataframe to S3 in ORC format. The code is as follows: val sqlContext = new

Re: Error when saving a dataframe as ORC file

2015-08-23 Thread Ted Yu
On Aug 23, 2015, at 12:49 PM, lostrain A donotlikeworkingh...@gmail.com wrote: Ted, Thanks for the suggestions. Actually I tried both s3n and s3 and the result remains the same. On Sun, Aug 23, 2015 at 12:27 PM, Ted Yu yuzhih...@gmail.com wrote: In your case, I would specify fs.s3

Re: subscribe

2015-08-22 Thread Ted Yu
See http://spark.apache.org/community.html Cheers On Sat, Aug 22, 2015 at 2:51 AM, Lars Hermes li...@hermes-it-consulting.de wrote: subscribe - To unsubscribe, e-mail: user-unsubscr...@spark.apache.org For additional

Re: How can I save the RDD result as Orcfile with spark1.3?

2015-08-22 Thread Ted Yu
to do this with spark 1.3? such as write the orcfile manually in foreachPartition method? On Sat, Aug 22, 2015 at 12:19 PM, Ted Yu yuzhih...@gmail.com wrote: ORC support was added in Spark 1.4 See SPARK-2883 On Fri, Aug 21, 2015 at 7:36 PM, dong.yajun dongt...@gmail.com wrote: Hi list

Re: Want to install lz4 compression

2015-08-21 Thread Ted Yu
Have you read this ? http://stackoverflow.com/questions/22716346/how-to-use-lz4-compression-in-linux-3-11 On Aug 21, 2015, at 6:57 AM, saif.a.ell...@wellsfargo.com saif.a.ell...@wellsfargo.com wrote: Hi all, I am using pre-compiled spark with hadoop 2.6. LZ4 Codec is not on hadoop’s

Re: Spark-Cassandra-connector

2015-08-21 Thread Ted Yu
Have you considered asking this question on https://groups.google.com/a/lists.datastax.com/forum/#!forum/spark-connector-user ? Cheers On Thu, Aug 20, 2015 at 10:57 PM, Samya samya.ma...@amadeus.com wrote: Hi All, I need to write an RDD to Cassandra using the sparkCassandraConnector from

Re: SparkSQL concerning materials

2015-08-20 Thread Ted Yu
See also http://spark.apache.org/docs/latest/api/scala/index.html#org.apache.spark.sql.package Cheers On Thu, Aug 20, 2015 at 7:50 AM, Muhammad Atif muhammadatif...@gmail.com wrote: Hi Dawid The best pace to get started is the Spark SQL Guide from Apache

Re: Scala: How to match a java object????

2015-08-19 Thread Ted Yu
Saif: In your example below, the error was due to there is no automatic conversion from Int to BigDecimal. Cheers On Aug 19, 2015, at 6:40 AM, saif.a.ell...@wellsfargo.com saif.a.ell...@wellsfargo.com wrote: Hi, thank you all for the asssistance. It is odd, it works when creating a

Re: What's the best practice for developing new features for spark ?

2015-08-19 Thread Ted Yu
See this thread: http://search-hadoop.com/m/q3RTtdZv0d1btRHl/Spark+build+modulesubj=Building+Spark+Building+just+one+module+ On Aug 19, 2015, at 1:44 AM, canan chen ccn...@gmail.com wrote: I want to work on one jira, but it is not easy to do unit test, because it involves different

Re: Spark executor lost because of GC overhead limit exceeded even though using 20 executors using 25GB each

2015-08-18 Thread Ted Yu
Do you mind providing a bit more information ? release of Spark code snippet of your app version of Java Thanks On Tue, Aug 18, 2015 at 8:57 AM, unk1102 umesh.ka...@gmail.com wrote: Hi this GC overhead limit error is making me crazy. I have 20 executors using 25 GB each I dont understand

Re: What am I missing that's preventing javac from finding the libraries (CLASSPATH is setup...)?

2015-08-18 Thread Ted Yu
Normally people would establish maven project with Spark dependencies or, use sbt. Can you go with either approach ? Cheers On Tue, Aug 18, 2015 at 10:28 AM, Jerry jerry.c...@gmail.com wrote: Hello, So I setup Spark to run on my local machine to see if I can reproduce the issue I'm having

Re: java.lang.IllegalAccessError: class com.google.protobuf.HBaseZeroCopyByteString cannot access its superclass com.google.protobuf.LiteralByteString

2015-08-17 Thread Ted Yu
Have you tried adding path to hbase-protocol jar to spark.driver.extraClassPath and spark.executor.extraClassPath ? Cheers On Mon, Aug 17, 2015 at 7:51 PM, stark_summer stark_sum...@qq.com wrote: spark vesion:1.4.1 java version:1.7 hadoop version: Hadoop 2.3.0-cdh5.1.0 submit spark job to

Re: Paper on Spark SQL

2015-08-17 Thread Ted Yu
I got 404 when trying to access the link. On Aug 17, 2015, at 5:31 AM, Todd bit1...@163.com wrote: Hi, I can't access http://people.csail.mit.edu/matei/papers/2015/sigmod_spark_sql.pdf. Could someone help try to see if it is available and reply with it?Thanks!

Re: Paper on Spark SQL

2015-08-17 Thread Ted Yu
Thanks Nan. That is why I always put an extra space between URL and punctuation in my comments / emails. On Mon, Aug 17, 2015 at 6:31 AM, Nan Zhu zhunanmcg...@gmail.com wrote: an extra “,” is at the end -- Nan Zhu http://codingcat.me On Monday, August 17, 2015 at 9:28 AM, Ted Yu wrote

Re: Spark on scala 2.11 build fails due to incorrect jline dependency in REPL

2015-08-17 Thread Ted Yu
You were building against 1.4.x, right ? In master branch, switch-to-scala-2.11.sh is gone. There is scala-2.11 profile. FYI On Sun, Aug 16, 2015 at 11:12 AM, Stephen Boesch java...@gmail.com wrote: I am building spark with the following options - most notably the **scala-2.11**: .

Re: Can't find directory after resetting REPL state

2015-08-15 Thread Ted Yu
I tried with master branch and got the following: http://pastebin.com/2nhtMFjQ FYI On Sat, Aug 15, 2015 at 1:03 AM, Kevin Jung itsjb.j...@samsung.com wrote: Spark shell can't find base directory of class server after running :reset command. scala :reset scala 1 uncaught exception during

Re: Spark RuntimeException hadoop output format

2015-08-14 Thread Ted Yu
path? What's the purpose of prefix and where do I specify the path if not in prefix? On Fri, Aug 14, 2015 at 4:36 PM, Ted Yu yuzhih...@gmail.com wrote: Please take a look at JavaPairDStream.scala: def saveAsHadoopFiles[F : OutputFormat[_, _]]( prefix: String, suffix: String

Re: Spark RuntimeException hadoop output format

2015-08-14 Thread Ted Yu
to a local file on the local file system for verification and I see the data: $ ls -ltr !$ ls -ltr /tmp/out -rw-r--r-- 1 yarn yarn 5230 Aug 13 15:45 /tmp/out On Fri, Aug 14, 2015 at 6:15 AM, Ted Yu yuzhih...@gmail.com wrote: Which Spark release are you using ? Can you show us snippet

Re: Spark RuntimeException hadoop output format

2015-08-14 Thread Ted Yu
Which Spark release are you using ? Can you show us snippet of your code ? Have you checked namenode log ? Thanks On Aug 13, 2015, at 10:21 PM, Mohit Anchlia mohitanch...@gmail.com wrote: I was able to get this working by using an alternative method however I only see 0 bytes files in

Re: graphx class not found error

2015-08-13 Thread Ted Yu
The code and error didn't go through. Mind sending again ? Which Spark release are you using ? On Thu, Aug 13, 2015 at 6:17 PM, dizzy5112 dave.zee...@gmail.com wrote: the code below works perfectly on both cluster and local modes but when i try to create a graph in cluster mode (it works

Re: Materials for deep insight into Spark SQL

2015-08-13 Thread Ted Yu
You can look under Developer Track: https://spark-summit.org/2015/#day-1 http://www.slideshare.net/jeykottalam/spark-sqlamp-camp2014?related=1 (slightly old) Catalyst design: https://docs.google.com/a/databricks.com/document/d/1Hc_Ehtr0G8SQUg69cmViZsMi55_Kf3tISD9GPGU5M1Y/edit FYI On Thu, Aug

Re: make-distribution.sh failing at spark/R/lib/sparkr.zip

2015-08-12 Thread Ted Yu
I ran your command on Linux which passed. Are you going to use SparkR ? If so, consider including the following: -Psparkr Cheers On Wed, Aug 12, 2015 at 3:31 AM, MEETHU MATHEW meethu2...@yahoo.co.in wrote: Hi, I am trying to create a package using the make-distribution.sh script from the

Re: 答复: 答复: Package Release Annoucement: Spark SQL on HBase Astro

2015-08-11 Thread Ted Yu
, *From:* Ted Yu [mailto:yuzhih...@gmail.com] *Sent:* Tuesday, August 11, 2015 3:28 PM *To:* Yan Zhou.sc *Cc:* Bing Xiao (Bing); d...@spark.apache.org; user@spark.apache.org *Subject:* Re: 答复: Package Release Annoucement: Spark SQL on HBase Astro HBase will not have query engine

Re: spark vs flink low memory available

2015-08-11 Thread Ted Yu
Pa: Can you try 1.5.0 SNAPSHOT ? See SPARK-7075 Project Tungsten (Spark 1.5 Phase 1) Cheers On Tue, Aug 11, 2015 at 12:49 AM, jun kit...@126.com wrote: your detail of log file? At 2015-08-10 22:02:16, Pa Rö paul.roewer1...@googlemail.com wrote: hi community, i have build a spark and

Re: 答复: Package Release Annoucement: Spark SQL on HBase Astro

2015-08-11 Thread Ted Yu
, …, etc., which allows for loosely-coupled query engines built on top of it. Thanks, 发件人: Ted Yu [mailto:yuzhih...@gmail.com] 发送时间: 2015年8月11日 8:54 收件人: Bing Xiao (Bing) 抄送: d...@spark.apache.org; user@spark.apache.org; Yan Zhou.sc 主题: Re: Package Release Annoucement: Spark SQL

Re: Unsupported major.minor version 51.0

2015-08-11 Thread Ted Yu
What does the following command say ? mvn -version Maybe you are using an old maven ? Cheers On Tue, Aug 11, 2015 at 7:55 AM, Yakubovich, Alexey alexey.yakubov...@searshc.com wrote: I found some discussions online, but it all cpome to advice to use JDF 1.7 (or 1.8). Well, I use JDK 1.7 on

Re: unsubscribe

2015-08-11 Thread Ted Yu
See first section of http://spark.apache.org/community.html On Tue, Aug 11, 2015 at 9:47 AM, Michel Robert m...@us.ibm.com wrote: Michel Robert Almaden Research Center EDA - IBM Systems and Technology Group Phone: (408) 927-2117 T/L 8-457-2117 E-mail: m...@us.ibm.com

Re: Does print/event logging affect performance?

2015-08-11 Thread Ted Yu
What level of logging are you looking at ? At INFO level, there shouldn't be noticeable difference. On Tue, Aug 11, 2015 at 12:24 PM, saif.a.ell...@wellsfargo.com wrote: Hi all, silly question. Does logging info messages, both print or to file, or event logging, cause any impact to general

Re: How to fix OutOfMemoryError: GC overhead limit exceeded when using Spark Streaming checkpointing

2015-08-10 Thread Ted Yu
I wonder during recovery from a checkpoint whether we can estimate the size of the checkpoint and compare with Runtime.getRuntime().freeMemory(). If the size of checkpoint is much bigger than free memory, log warning, etc Cheers On Mon, Aug 10, 2015 at 9:34 AM, Dmitry Goldenberg

<    6   7   8   9   10   11   12   13   14   15   >