Re: SIGBUS (0xa) when using DataFrameWriter.insertInto

2018-10-27 Thread Ted Yu
I don't seem to find the log. Can you double check ? Thanks Original message From: alexzautke Date: 10/27/18 8:54 AM (GMT-08:00) To: user@spark.apache.org Subject: Re: SIGBUS (0xa) when using DataFrameWriter.insertInto Please also find attached a complete error log. --

Re: error while submitting job

2018-09-29 Thread Ted Yu
Can you tell us the version of Spark and the connector you used ? Thanks  Original message From: yuvraj singh <19yuvrajsing...@gmail.com> Date: 9/29/18 10:42 PM (GMT-08:00) To: user@spark.apache.org Subject: error while submitting job Hi , i am getting this error please

Re: OOM: Structured Streaming aggregation state not cleaned up properly

2018-05-19 Thread Ted Yu
Hi, w.r.t. ElementTrackingStore, since it is backed by KVStore, there should be other classes which occupy significant memory. Can you pastebin the top 10 entries among the heap dump ? Thanks

Re: KafkaUtils.createStream(..) is removed for API

2018-02-18 Thread Ted Yu
createStream() is still in external/kafka-0-8/src/main/scala/org/apache/spark/streaming/kafka/KafkaUtils.scala But it is not in external/kafka-0-10/src/main/scala/org/apache/spark/streaming/kafka010/KafkaUtils.scala FYI On Sun, Feb 18, 2018 at 5:17 PM, naresh Goud

Re: Broken SQL Visualization?

2018-01-15 Thread Ted Yu
Did you include any picture ? Looks like the picture didn't go thru. Please use third party site.  Thanks Original message From: Tomasz Gawęda Date: 1/15/18 2:07 PM (GMT-08:00) To: d...@spark.apache.org, user@spark.apache.org Subject: Broken SQL

Re: how to mention others in JIRA comment please?

2017-06-26 Thread Ted Yu
You can find the JIRA handle of the person you want to mention by going to a JIRA where that person has commented. e.g. you want to find the handle for Joseph. You can go to: https://issues.apache.org/jira/browse/SPARK-6635 and click on his name in comment:

Re: the compile of spark stoped without any hints, would you like help me please?

2017-06-25 Thread Ted Yu
Does adding -X to mvn command give you more information ? Cheers On Sun, Jun 25, 2017 at 5:29 AM, 萝卜丝炒饭 <1427357...@qq.com> wrote: > Hi all, > > Today I use new PC to compile SPARK. > At the beginning, it worked well. > But it stop at some point. > the content in consle is : >

Re: HBaseContext with Spark

2017-01-25 Thread Ted Yu
Does the storage handler provide bulk load capability ? Cheers > On Jan 25, 2017, at 3:39 AM, Amrit Jangid wrote: > > Hi chetan, > > If you just need HBase Data into Hive, You can use Hive EXTERNAL TABLE with > STORED BY

Re: HBaseContext with Spark

2017-01-25 Thread Ted Yu
The references are vendor specific. Suggest contacting vendor's mailing list for your PR. My initial interpretation of HBase repository is that of Apache. Cheers On Wed, Jan 25, 2017 at 7:38 AM, Chetan Khatri <chetan.opensou...@gmail.com> wrote: > @Ted Yu, Correct but HBase-Spa

Re: HBaseContext with Spark

2017-01-25 Thread Ted Yu
Though no hbase release has the hbase-spark module, you can find the backport patch on HBASE-14160 (for Spark 1.6) You can build the hbase-spark module yourself. Cheers On Wed, Jan 25, 2017 at 3:32 AM, Chetan Khatri wrote: > Hello Spark Community Folks, > >

Re: Approach: Incremental data load from HBASE

2016-12-21 Thread Ted Yu
of processing is delivered to hbase. Cheers On Wed, Dec 21, 2016 at 8:00 AM, Chetan Khatri <chetan.opensou...@gmail.com> wrote: > Ok, Sure will ask. > > But what would be generic best practice solution for Incremental load from > HBASE. > > On Wed, Dec 21, 2016 at 8:42 PM, Ted

Re: Approach: Incremental data load from HBASE

2016-12-21 Thread Ted Yu
I haven't used Gobblin. You can consider asking Gobblin mailing list of the first option. The second option would work. On Wed, Dec 21, 2016 at 2:28 AM, Chetan Khatri wrote: > Hello Guys, > > I would like to understand different approach for Distributed

Re: namespace quota not take effect

2016-08-25 Thread Ted Yu
This question should have been posted to user@ Looks like you were using wrong config. See: http://hbase.apache.org/book.html#quota See 'Setting Namespace Quotas' section further down. Cheers On Tue, Aug 23, 2016 at 11:38 PM, W.H wrote: > hi guys > I am testing the hbase

Re: Attempting to accept an unknown offer

2016-08-17 Thread Ted Yu
i am creating data frame from a hive sql. There are other > similar jobs which work fine > > On Wed, Aug 17, 2016 at 8:52 AM, Ted Yu <yuzhih...@gmail.com> wrote: > >> Can you provide more information ? >> >> Were you running on YARN ? >> Which version

Re: Attempting to accept an unknown offer

2016-08-17 Thread Ted Yu
Can you provide more information ? Were you running on YARN ? Which version of Spark are you using ? Was your job failing ? Thanks On Wed, Aug 17, 2016 at 8:46 AM, vr spark wrote: > > W0816 23:17:01.984846 16360 sched.cpp:1195] Attempting to accept an > unknown offer

Re: Undefined function json_array_to_map

2016-08-17 Thread Ted Yu
Can you show the complete stack trace ? Which version of Spark are you using ? Thanks On Wed, Aug 17, 2016 at 8:46 AM, vr spark wrote: > Hi, > I am getting error on below scenario. Please suggest. > > i have a virtual view in hive > > view name log_data > it has 2

Re: Spark 2.0.0 JaninoRuntimeException

2016-08-16 Thread Ted Yu
code with 600 columns, but it's a converted dataset of case classes to > dataframe. This is deterministically causing the error in Scala 2.11. > > Once I can get a deterministically breaking test without work code I will > try to file a Jira bug. > > On Tue, Aug 16, 2016, 04:17 Ted

Re: long lineage

2016-08-16 Thread Ted Yu
Have you tried periodic checkpoints ? Cheers > On Aug 16, 2016, at 5:50 AM, pseudo oduesp wrote: > > Hi , > how we can deal after raise stackoverflow trigger by long lineage ? > i mean i have this error and how resolve it wiyhout creating new session > thanks >

Re: class not found exception Logging while running JavaKMeansExample

2016-08-16 Thread Ted Yu
all the necessary log4j and sl4j dependencies in pom. I am > still not getting what dependencies I am missing. > > Best Regards, > Subash Basnet > > On Mon, Aug 15, 2016 at 6:50 PM, Ted Yu <yuzhih...@gmail.com> wrote: > >> Logging has become private in 2.0 release: >

Re: Spark 2.0.0 JaninoRuntimeException

2016-08-16 Thread Ted Yu
uce the problem in SPARK-15285 with master branch. > Should we reopen SPARK-15285? > > Best Regards, > Kazuaki Ishizaki, > > > > From:Ted Yu <yuzhih...@gmail.com> > To:dhruve ashar <dhruveas...@gmail.com> > Cc:Aris <arisofala...@

Re: class not found exception Logging while running JavaKMeansExample

2016-08-15 Thread Ted Yu
Logging has become private in 2.0 release: private[spark] trait Logging { On Mon, Aug 15, 2016 at 9:48 AM, subash basnet wrote: > Hello all, > > I am trying to run JavaKMeansExample of the spark example project. I am > getting the classnotfound exception error: > *Exception

Re: Spark 2.0.0 JaninoRuntimeException

2016-08-14 Thread Ted Yu
Looks like the proposed fix was reverted: Revert "[SPARK-15285][SQL] Generated SpecificSafeProjection.apply method grows beyond 64 KB" This reverts commit fa244e5a90690d6a31be50f2aa203ae1a2e9a1cf. Maybe this was fixed in some other JIRA ? On Fri, Aug 12, 2016 at 2:30 PM, dhruve ashar

Re: Why I can't use broadcast var defined in a global object?

2016-08-13 Thread Ted Yu
Can you (or David) resend David's reply ? I don't see the reply in this thread. Thanks > On Aug 13, 2016, at 8:39 PM, yaochunnan wrote: > > Hi David, > Your answers have solved my problem! Detailed and accurate. Thank you very > much! > > > > -- > View this message

Re: Single point of failure with Driver host crashing

2016-08-11 Thread Ted Yu
Have you read https://spark.apache.org/docs/latest/spark-standalone.html#high-availability ? FYI On Thu, Aug 11, 2016 at 12:40 PM, Mich Talebzadeh wrote: > > Hi, > > Although Spark is fault tolerant when nodes go down like below: > > FROM tmp > [Stage 1:===>

Re: Getting a TreeNode Exception while saving into Hadoop

2016-08-08 Thread Ted Yu
> val placesProcessed = placesUnchanged.unionAll(placesAddedWithMerchantId). > unionAll(placesUpdatedFromHotelsWithMerchantId).unionAll(pla > cesUpdatedFromRestaurantsWithMerchantId).unionAll(placesChanged) > > I'm using Spark 1.6.2. > > On Mon, Aug 8, 2016 at 3:11 PM, Ted Yu <

Re: Getting a TreeNode Exception while saving into Hadoop

2016-08-08 Thread Ted Yu
Can you show the code snippet for unionAll operation ? Which Spark release do you use ? BTW please use user@spark.apache.org in the future. On Mon, Aug 8, 2016 at 11:47 AM, max square wrote: > Hey guys, > > I'm trying to save Dataframe in CSV format after performing

Re: Multiple Sources Found for Parquet

2016-08-08 Thread Ted Yu
Can you examine classpath to see where *DefaultSource comes from ?* *Thanks* On Mon, Aug 8, 2016 at 2:34 AM, 金国栋 wrote: > I'm using Spark2.0.0 to do sql analysis over parquet files, when using > `read().parquet("path")`, or `write().parquet("path")` in Java(I followed > the

Re: submitting spark job with kerberized Hadoop issue

2016-08-07 Thread Ted Yu
The link in Jerry's response was quite old. Please see: http://hbase.apache.org/book.html#security Thanks On Sun, Aug 7, 2016 at 6:55 PM, Saisai Shao wrote: > 1. Standalone mode doesn't support accessing kerberized Hadoop, simply > because it lacks the mechanism to

Re: Symbol HasInputCol is inaccesible from this place

2016-08-06 Thread Ted Yu
eems like, wondering if this can be made public in order to develop > custom transformers or any other alternatives ? > > On Sat, Aug 6, 2016 at 10:07 AM, Ted Yu <yuzhih...@gmail.com> wrote: > >> Is it because HasInputCol is private ? >> >> private[ml] trait Has

Re: Symbol HasInputCol is inaccesible from this place

2016-08-06 Thread Ted Yu
Is it because HasInputCol is private ? private[ml] trait HasInputCol extends Params { On Thu, Aug 4, 2016 at 1:18 PM, janardhan shetty wrote: > Version : 2.0.0-preview > > import org.apache.spark.ml.param._ > import org.apache.spark.ml.param.shared.{HasInputCol,

Re: ClassNotFoundException org.apache.spark.Logging

2016-08-05 Thread Ted Yu
do. > > How to fix it? > > Many thanks, > Carlo > > On 5 Aug 2016, at 17:58, Ted Yu <yuzhih...@gmail.com> wrote: > > private[spark] trait Logging { > > > -- The Open University is incorporated by Royal Charter (RC 000391), an > exempt charity in England &

Re: ClassNotFoundException org.apache.spark.Logging

2016-08-05 Thread Ted Yu
In 2.0, Logging became private: private[spark] trait Logging { FYI On Fri, Aug 5, 2016 at 9:53 AM, Carlo.Allocca wrote: > Dear All, > > I would like to ask for your help about the following issue: > java.lang.ClassNotFoundException: > org.apache.spark.Logging > > I

Re: What is "Developer API " in spark documentation?

2016-08-05 Thread Ted Yu
See previous discussion : http://search-hadoop.com/m/q3RTtTvrPrc6O2h1=Re+discuss+separate+API+annotation+into+two+components+InterfaceAudience+InterfaceStability > On Aug 5, 2016, at 2:55 AM, Aseem Bansal wrote: > > Hi > > Many of spark documentation say "Developer API".

Re: source code for org.spark-project.hive

2016-08-04 Thread Ted Yu
https://github.com/JoshRosen/hive/tree/release-1.2.1-spark2 FYI On Thu, Aug 4, 2016 at 6:23 AM, prabhat__ wrote: > hey > can anyone point me to the source code for the jars used with group-id > org.spark-project.hive. > This was previously maintained in the private

Re: how to debug spark app?

2016-08-03 Thread Ted Yu
Have you looked at: https://spark.apache.org/docs/latest/running-on-yarn.html#debugging-your-application If you use Mesos: https://spark.apache.org/docs/latest/running-on-mesos.html#troubleshooting-and-debugging On Wed, Aug 3, 2016 at 6:13 PM, glen wrote: > Any tool like gdb?

Re: java.net.URISyntaxException: Relative path in absolute URI:

2016-08-03 Thread Ted Yu
SPARK-15899 ? On Wed, Aug 3, 2016 at 11:05 AM, Flavio wrote: > Hello everyone, > > I am try to run a very easy example but unfortunately I am stuck on the > follow exception: > > Exception in thread "main"

Re: Managed memory leak detected + OutOfMemoryError: Unable to acquire X bytes of memory, got 0

2016-08-03 Thread Ted Yu
Spark 2.0 has been released. Mind giving it a try :-) ? On Wed, Aug 3, 2016 at 9:11 AM, Rychnovsky, Dusan < dusan.rychnov...@firma.seznam.cz> wrote: > OK, thank you. What do you suggest I do to get rid of the error? > > > ---------- > *From:* Ted Yu

Re: Managed memory leak detected + OutOfMemoryError: Unable to acquire X bytes of memory, got 0

2016-08-03 Thread Ted Yu
> > > ---------- > *From:* Rychnovsky, Dusan > *Sent:* Wednesday, August 3, 2016 3:58 PM > *To:* Ted Yu > > *Cc:* user@spark.apache.org > *Subject:* Re: Managed memory leak detected + OutOfMemoryError: Unable to

Re: Managed memory leak detected + OutOfMemoryError: Unable to acquire X bytes of memory, got 0

2016-08-03 Thread Ted Yu
Are you using Spark 1.6+ ? See SPARK-11293 On Wed, Aug 3, 2016 at 5:03 AM, Rychnovsky, Dusan < dusan.rychnov...@firma.seznam.cz> wrote: > Hi, > > > I have a Spark workflow that when run on a relatively small portion of > data works fine, but when run on big data fails with strange errors. In

Re: [2.0.0] mapPartitions on DataFrame unable to find encoder

2016-08-02 Thread Ted Yu
Using spark-shell of master branch: scala> case class Entry(id: Integer, name: String) defined class Entry scala> val df = Seq((1,"one"), (2, "two")).toDF("id", "name").as[Entry] 16/08/02 16:47:01 DEBUG package$ExpressionCanonicalizer: === Result of Batch CleanExpressions ===

Re: Extracting key word from a textual column

2016-08-02 Thread Ted Yu
+1 > On Aug 2, 2016, at 2:29 PM, Jörn Franke wrote: > > If you need to use single inserts, updates, deletes, select why not use hbase > with Phoenix? I see it as complementary to the hive / warehouse offering > >> On 02 Aug 2016, at 22:34, Mich Talebzadeh

Re: Job can not terminated in Spark 2.0 on Yarn

2016-08-02 Thread Ted Yu
Which hadoop version are you using ? Can you show snippet of your code ? Thanks On Tue, Aug 2, 2016 at 10:06 AM, Liangzhao Zeng wrote: > Hi, > > > I migrate my code to Spark 2.0 from 1.6. It finish last stage (and result is > correct) but get following errors then

Re: ERROR Remote RPC client disassociated. Likely due to containers exceeding thresholds, or network issues. Check driver logs for WARN messages.

2016-08-01 Thread Ted Yu
Ascot Moss <ascot.m...@gmail.com> wrote: > >> My JDK is Java 1.8 u40 >> >> On Sun, Jul 24, 2016 at 3:45 AM, Ted Yu <yuzhih...@gmail.com> wrote: >> >>> Since you specified +PrintGCDetails, you should be able to get some >>> more detail from the

Re: JettyUtils.createServletHandler Method not Found?

2016-08-01 Thread Ted Yu
Original discussion was about Spark 1.3 Which Spark release are you using ? Cheers On Mon, Aug 1, 2016 at 1:37 AM, bg_spark <1412743...@qq.com> wrote: > hello,I have the same problem like you, how do you solve the problem? > > > > -- > View this message in context: >

Re: ERROR Remote RPC client disassociated. Likely due to containers exceeding thresholds, or network issues. Check driver logs for WARN messages.

2016-07-23 Thread Ted Yu
Since you specified +PrintGCDetails, you should be able to get some more detail from the GC log. Also, which JDK version are you using ? Please use Java 8 where G1GC is more reliable. On Sat, Jul 23, 2016 at 10:38 AM, Ascot Moss wrote: > Hi, > > I added the following

Re: Exception in thread "dispatcher-event-loop-1" java.lang.OutOfMemoryError: Java heap space

2016-07-22 Thread Ted Yu
How much heap memory do you give the driver ? On Fri, Jul 22, 2016 at 2:17 PM, Andy Davidson < a...@santacruzintegration.com> wrote: > Given I get a stack trace in my python notebook I am guessing the driver > is running out of memory? > > My app is simple it creates a list of dataFrames from

Re: NoClassDefFoundError with ZonedDateTime

2016-07-21 Thread Ted Yu
ay to get the Classpath for the spark application > itself? > > On Thu, Jul 21, 2016 at 9:37 PM Ted Yu <yuzhih...@gmail.com> wrote: > >> Might be classpath issue. >> >> Mind pastebin'ning the effective class path ? >> >> Stack trace of NoClassDefFoundError ma

Re: NoClassDefFoundError with ZonedDateTime

2016-07-21 Thread Ted Yu
Might be classpath issue. Mind pastebin'ning the effective class path ? Stack trace of NoClassDefFoundError may also help provide some clue. On Thu, Jul 21, 2016 at 8:26 PM, Ilya Ganelin wrote: > Hello - I'm trying to deploy the Spark TimeSeries library in a new >

Re: Is it good choice to use DAO to store results generated by spark application?

2016-07-20 Thread Ted Yu
You can decide which component(s) to use for storing your data. If you haven't used hbase before, it may be better to store data on hdfs and query through Hive or SparkSQL. Maintaining hbase is not trivial task, especially when the cluster size is large. How much data are you expecting to be

Re: Is it good choice to use DAO to store results generated by spark application?

2016-07-19 Thread Ted Yu
hbase-spark module is in the up-coming hbase 2.0 release. Currently it is in master branch of hbase git repo. FYI On Tue, Jul 19, 2016 at 8:27 PM, Andrew Ehrlich wrote: > There is a Spark<->HBase library that does this. I used it once in a > prototype (never tried in

Re: Missing Exector Logs From Yarn After Spark Failure

2016-07-19 Thread Ted Yu
What's the value for yarn.log-aggregation.retain-seconds and yarn.log-aggregation-enable ? Which hadoop release are you using ? Thanks On Tue, Jul 19, 2016 at 3:23 PM, Rachana Srivastava < rachana.srivast...@markmonitor.com> wrote: > I am trying to find the root cause of recent Spark

Re: I'm trying to understand how to compile Spark

2016-07-19 Thread Ted Yu
org.apache.spark.mllib.fpm is not a maven goal. -pl is For Individual Projects. Your first build action should not include -pl. On Tue, Jul 19, 2016 at 4:22 AM, Eli Super wrote: > Hi > > I have a windows laptop > > I just downloaded the spark 1.4.1 source code. > > I try

Re: Spark ResourceLeak??

2016-07-19 Thread Ted Yu
ResourceLeakDetector doesn't seem to be from Spark. Please check dependencies for potential leak. Cheers On Tue, Jul 19, 2016 at 6:11 AM, Guruji wrote: > I am running a Spark Cluster on Mesos. The module reads data from Kafka as > DirectStream and pushes it into

Re: Input path does not exist error in giving input file for word count program

2016-07-15 Thread Ted Yu
>From >examples/src/main/scala/org/apache/spark/examples/streaming/HdfsWordCount.scala : val lines = ssc.textFileStream(args(0)) val words = lines.flatMap(_.split(" ")) In your case, looks like inputfile didn't correspond to an existing path. On Fri, Jul 15, 2016 at 1:05 AM, RK Spark

Re: Call http request from within Spark

2016-07-14 Thread Ted Yu
Second to what Pedro said in the second paragraph. Issuing http request per row would not scale. On Thu, Jul 14, 2016 at 12:26 PM, Pedro Rodriguez wrote: > Hi Amit, > > Have you tried running a subset of the IDs locally on a single thread? It > would be useful to

Re: Issue in spark job. Remote rpc client dissociated

2016-07-13 Thread Ted Yu
Which Spark release are you using ? Can you disclose what the folder processing does (code snippet is better) ? Thanks On Wed, Jul 13, 2016 at 9:44 AM, Balachandar R.A. wrote: > Hello > > In one of my use cases, i need to process list of folders in parallel. I > used

Re: Optimize filter operations with sorted data

2016-07-07 Thread Ted Yu
Does the filter under consideration operate on sorted column(s) ? Cheers > On Jul 7, 2016, at 2:25 AM, tan shai wrote: > > Hi, > > I have a sorted dataframe, I need to optimize the filter operations. > How does Spark performs filter operations on sorted dataframe? >

Re: Saving parquet table as uncompressed with write.mode("overwrite").

2016-07-03 Thread Ted Yu
Have you tried the following (note the extraneous dot in your config name) ? val c = sqlContext.setConf("spark.sql.parquet.compression.codec", "none") Also, parquet() has compression parameter which defaults to None FYI On Sun, Jul 3, 2016 at 2:42 PM, Mich Talebzadeh

Re: Spark driver assigning splits to incorrect workers

2016-07-01 Thread Ted Yu
I guess you extended some InputFormat for providing locality information. Can you share some code snippet ? Which non-distributed file system are you using ? Thanks On Fri, Jul 1, 2016 at 2:46 PM, Raajen wrote: > I would like to use Spark on a non-distributed file system

Re: Why so many parquet file part when I store data in Alluxio or File?

2016-06-30 Thread Ted Yu
Looking under Alluxio source, it seems only "fs.hdfs.impl.disable.cache" is in use. FYI On Thu, Jun 30, 2016 at 9:30 PM, Deepak Sharma wrote: > Ok. > I came across this issue. > Not sure if you already assessed this: >

Re: Spark master shuts down when one of zookeeper dies

2016-06-30 Thread Ted Yu
the cluster is up. > But the master that was down , never comes up. > > Is this the expected ? Is there a way to get alert when a master is down ? > How to make sure that there is atleast one back up master is up always ? > > Thanks > Vimal > > > > > On Tue, Jun 2

Metadata for the StructField

2016-06-29 Thread Ted Yu
You can specify Metadata for the StructField : case class StructField( name: String, dataType: DataType, nullable: Boolean = true, metadata: Metadata = Metadata.empty) { FYI On Wed, Jun 29, 2016 at 2:50 AM, pooja mehta wrote: > Hi, > > Want to add a

Re: Modify the functioning of zipWithIndex function for RDDs

2016-06-28 Thread Ted Yu
ook like: (x._1, split.startIndex + x._2 + > x._1.length) ? > > On Tue, Jun 28, 2016 at 11:09 PM, Ted Yu <yuzhih...@gmail.com> wrote: > >> Please take a look at: >> core/src/main/scala/org/apache/spark/rdd/ZippedWithIndexRDD.scala >> >> In

Re: Modify the functioning of zipWithIndex function for RDDs

2016-06-28 Thread Ted Yu
Please take a look at: core/src/main/scala/org/apache/spark/rdd/ZippedWithIndexRDD.scala In compute() method: val split = splitIn.asInstanceOf[ZippedWithIndexRDDPartition] firstParent[T].iterator(split.prev, context).zipWithIndex.map { x => (x._1, split.startIndex + x._2) You can

Re: Spark master shuts down when one of zookeeper dies

2016-06-28 Thread Ted Yu
Please see some blog w.r.t. the number of nodes in the quorum: http://stackoverflow.com/questions/13022244/zookeeper-reliability-three-versus-five-nodes http://www.ibm.com/developerworks/library/bd-zookeeper/ the paragraph starting with 'A quorum is represented by a strict majority of nodes'

Re: Utils and Logging cannot be accessed in package ....

2016-06-27 Thread Ted Yu
AFAICT Utils is private: private[spark] object Utils extends Logging { So is Logging: private[spark] trait Logging { FYI On Mon, Jun 27, 2016 at 8:20 AM, Paolo Patierno wrote: > Hello, > > I'm trying to use the Utils.createTempDir() method importing >

Re: Arrays in Datasets (1.6.1)

2016-06-27 Thread Ted Yu
Can you show the stack trace for encoding error(s) ? Have you looked at the following test which involves NestedArray of primitive type ? ./sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/encoders/ExpressionEncoderSuite.scala Cheers On Mon, Jun 27, 2016 at 8:50 AM, Daniel Imberman

Re: Logging trait in Spark 2.0

2016-06-24 Thread Ted Yu
See this related thread: http://search-hadoop.com/m/q3RTtEor1vYWbsW=RE+Configuring+Log4J+Spark+1+5+on+EMR+4+1+ On Fri, Jun 24, 2016 at 6:07 AM, Paolo Patierno wrote: > Hi, > > developing a Spark Streaming custom receiver I noticed that the Logging > trait isn't accessible

Re: DataFrame versus Dataset creation and usage

2016-06-24 Thread Ted Yu
In Spark 2.0, Dataset and DataFrame are unified. Would this simplify your use case ? On Fri, Jun 24, 2016 at 7:27 AM, Martin Serrano wrote: > Hi, > > I'm exposing a custom source to the Spark environment. I have a question > about the best way to approach this problem. > >

Re: Kryo ClassCastException during Serialization/deserialization in Spark Streaming

2016-06-23 Thread Ted Yu
Can you illustrate how sampleMap is populated ? Thanks On Thu, Jun 23, 2016 at 12:34 PM, SRK wrote: > Hi, > > I keep getting the following error in my Spark Streaming every now and then > after the job runs for say around 10 hours. I have those 2 classes >

Re: Multiple compute nodes in standalone mode

2016-06-23 Thread Ted Yu
Have you looked at: https://spark.apache.org/docs/latest/spark-standalone.html On Thu, Jun 23, 2016 at 12:28 PM, avendaon wrote: > Hi all, > > I have a cluster that has multiple nodes, and the data partition is > unified, > therefore all my nodes in my computer can access

Re: NullPointerException when starting StreamingContext

2016-06-22 Thread Ted Yu
Which Scala version / Spark release are you using ? Cheers On Wed, Jun 22, 2016 at 8:20 PM, Sunita Arvind wrote: > Hello Experts, > > I am getting this error repeatedly: > > 16/06/23 03:06:59 ERROR streaming.StreamingContext: Error starting the > context, marking it as

Re: spark-1.6.1-bin-without-hadoop can not use spark-sql

2016-06-22 Thread Ted Yu
http://d3kbcqa49mib13.cloudfront.net/spark-1.6.1-bin-hadoop2.6.tgz> which > is a pre-built package on hadoop 2.7.2? > > > > ------ 原始邮件 -- > *发件人:* "Ted Yu";<yuzhih...@gmail.com>; > *发送时间:* 2016年6月22日(星期三) 晚上11:51 > *收件人:* "喜之郎"&l

Re: Confusing argument of sql.functions.count

2016-06-22 Thread Ted Yu
; regardless of its type. Intuition here is that count should take no > parameter. Or am I missing something? > > Jakub > > On Wed, Jun 22, 2016 at 6:19 PM, Ted Yu <yuzhih...@gmail.com> wrote: > >> Are you referring to the following method in >> sql/core/src/main/sc

Re: Confusing argument of sql.functions.count

2016-06-22 Thread Ted Yu
Are you referring to the following method in sql/core/src/main/scala/org/apache/spark/sql/functions.scala : def count(e: Column): Column = withAggregateFunction { Did you notice this method ? def count(columnName: String): TypedColumn[Any, Long] = On Wed, Jun 22, 2016 at 9:06 AM, Jakub

Re: spark-1.6.1-bin-without-hadoop can not use spark-sql

2016-06-22 Thread Ted Yu
hat on param --hadoop, 2.7.2 or others? > > 来自我的华为手机 > > > 原始邮件 > 主题:Re: spark-1.6.1-bin-without-hadoop can not use spark-sql > 发件人:Ted Yu > 收件人:喜之郎 <251922...@qq.com> > 抄送:user > > > I wonder if the tar ball was built with: > > -Phive -Phi

Re: spark-1.6.1-bin-without-hadoop can not use spark-sql

2016-06-22 Thread Ted Yu
I wonder if the tar ball was built with: -Phive -Phive-thriftserver Maybe rebuild by yourself with the above ? FYI On Wed, Jun 22, 2016 at 4:38 AM, 喜之郎 <251922...@qq.com> wrote: > Hi all. > I download spark-1.6.1-bin-without-hadoop.tgz >

Re: Spark 1.5.2 - Different results from reduceByKey over multiple iterations

2016-06-22 Thread Ted Yu
For the run which returned incorrect result, did you observe any error (on workers) ? Cheers On Tue, Jun 21, 2016 at 10:42 PM, Nirav Patel wrote: > I have an RDD[String, MyObj] which is a result of Join + Map operation. It > has no partitioner info. I run reduceByKey

Re: scala.NotImplementedError: put() should not be called on an EmptyStateMap while doing stateful computation on spark streaming

2016-06-21 Thread Ted Yu
Are you using 1.6.1 ? If not, does the problem persist when you use 1.6.1 ? Thanks > On Jun 20, 2016, at 11:16 PM, umanga wrote: > > I am getting following warning while running stateful computation. The state > consists of BloomFilter (stream-lib) as Value and Integer

Re: Build Spark 2.0 succeeded but could not run it on YARN

2016-06-20 Thread Ted Yu
What operations did you run in the Spark shell ? It would be easier for other people to reproduce using your code snippet. Thanks On Mon, Jun 20, 2016 at 6:20 PM, Jeff Zhang wrote: > Could you check the yarn app logs for details ? run command "yarn logs > -applicationId " to

Re: Accessing system environment on Spark Worker

2016-06-19 Thread Ted Yu
Have you looked at http://spark.apache.org/docs/latest/ec2-scripts.html ? There is description on setting AWS_SECRET_ACCESS_KEY. On Sun, Jun 19, 2016 at 4:46 AM, Mohamed Taher AlRefaie wrote: > Hello all: > > I have an application that requires accessing DynamoDB tables. Each

Re: How to cause a stage to fail (using spark-shell)?

2016-06-19 Thread Ted Yu
You can utilize a counter in external storage (NoSQL e.g.) When the counter reaches 2, stop throwing exception so that the task passes. FYI On Sun, Jun 19, 2016 at 3:22 AM, Jacek Laskowski wrote: > Hi, > > Thanks Burak for the idea, but it *only* fails the tasks that >

Re: Switching broadcast mechanism from torrrent

2016-06-19 Thread Ted Yu
t;>>> at scala.collection.immutable.List.foreach(List.scala:318) >>>> at org.apache.spark.broadcast.TorrentBroadcast.org >>>> $apache$spark$broadcast$TorrentBroadcast$$readBlocks(TorrentBroadcast.scala:120) >>>> at &g

Re: Dataset Select Function after Aggregate Error

2016-06-18 Thread Ted Yu
scala> ds.groupBy($"_1").count.select(expr("_1").as[String], expr("count").as[Long]) res0: org.apache.spark.sql.Dataset[(String, Long)] = [_1: int, count: bigint] scala> ds.groupBy($"_1").count.select(expr("_1").as[String], expr("count").as[Long]).show +---+-+ | _1|count| +---+-+ | 1|

Re: spark-xml - xml parsing when rows only have attributes

2016-06-17 Thread Ted Yu
Please see https://github.com/databricks/spark-xml/issues/92 On Fri, Jun 17, 2016 at 5:19 AM, VG wrote: > I am using spark-xml for loading data and creating a data frame. > > If xml element has sub elements and values, then it works fine. Example > if the xml element is like

Re: Spark jobs without a login

2016-06-16 Thread Ted Yu
Can you describe more about the container ? Please show complete stack trace for the exception. Thanks On Thu, Jun 16, 2016 at 1:32 PM, jay vyas wrote: > Hi spark: > > Is it possible to avoid reliance on a login user when running a spark job? > > I'm running out a

Re: Kerberos setup in Apache spark connecting to remote HDFS/Yarn

2016-06-16 Thread Ted Yu
bq. Caused by: KrbException: Cannot locate default realm Can you show the rest of the stack trace ? What versions of Spark / Hadoop are you using ? Which version of Java are you using (local and in cluster) ? Thanks On Thu, Jun 16, 2016 at 6:32 AM, akhandeshi wrote:

Re: Reporting warnings from workers

2016-06-15 Thread Ted Yu
Have you looked at: https://spark.apache.org/docs/latest/programming-guide.html#accumulators On Wed, Jun 15, 2016 at 1:24 PM, Mathieu Longtin wrote: > Is there a way to report warnings from the workers back to the driver > process? > > Let's say I have an RDD and do

Re: Spark 2.0 release date

2016-06-15 Thread Ted Yu
Andy: You should sense the tone in Mich's response. To my knowledge, there hasn't been an RC for the 2.0 release yet. Once we have an RC, it goes through the normal voting process. FYI On Wed, Jun 15, 2016 at 7:38 AM, andy petrella wrote: > > tomorrow lunch time >

Re: hivecontext error

2016-06-14 Thread Ted Yu
Which release of Spark are you using ? Can you show the full error trace ? Thanks On Tue, Jun 14, 2016 at 6:33 PM, Tejaswini Buche < tejaswini.buche0...@gmail.com> wrote: > I am trying to use hivecontext in spark. The following statements are > running fine : > > from pyspark.sql import

Re: MAtcheERROR : STRINGTYPE

2016-06-14 Thread Ted Yu
Can you give a bit more detail ? version of Spark complete error trace code snippet which reproduces the error On Tue, Jun 14, 2016 at 9:54 AM, pseudo oduesp wrote: > hello > > why i get this error > > when using > > assembleur = VectorAssembler( inputCols=l_CDMVT,

Re: Spark Streaming application failing with Kerboros issue while writing data to HBase

2016-06-13 Thread Ted Yu
Can you show snippet of your code, please ? Please refer to obtainTokenForHBase() in yarn/src/main/scala/org/apache/spark/deploy/yarn/YarnSparkHadoopUtil.scala Cheers On Mon, Jun 13, 2016 at 4:44 AM, Kamesh wrote: > Hi All, > We are building a spark streaming

Re: Basic question. Access MongoDB data in Spark.

2016-06-13 Thread Ted Yu
Have you considered posting the question on stratio's mailing list ? You may get faster response there. On Mon, Jun 13, 2016 at 8:09 AM, Umair Janjua wrote: > Hi guys, > > I have this super basic problem which I cannot figure out. Can somebody > give me a hint. > >

Re: Spark Getting data from MongoDB in JAVA

2016-06-12 Thread Ted Yu
What's the value of spark.version ? Do you know which version of Spark mongodb connector 0.10.3 was built against ? You can use the following command to find out: mvn dependency:tree Maybe the Spark version you use is different from what mongodb connector was built against. On Fri, Jun 10,

Re: Book for Machine Learning (MLIB and other libraries on Spark)

2016-06-11 Thread Ted Yu
gt; > HTH > > > Dr Mich Talebzadeh > > > > LinkedIn * > https://www.linkedin.com/profile/view?id=AAEWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw > <https://www.linkedin.com/profile/view?id=AAEWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw>* > > > > http://talebzadehmic

Re: Book for Machine Learning (MLIB and other libraries on Spark)

2016-06-11 Thread Ted Yu
https://www.amazon.com/Machine-Learning-Spark-Powerful-Algorithms/dp/1783288515/ref=sr_1_1?ie=UTF8=1465657706=8-1=spark+mllib https://www.amazon.com/Spark-Practical-Machine-Learning-Chinese/dp/7302420424/ref=sr_1_3?ie=UTF8=1465657706=8-3=spark+mllib

Re: OutOfMemory when doing joins in spark 2.0 while same code runs fine in spark 1.5.2

2016-06-09 Thread Ted Yu
bq. Read data from hbase using custom DefaultSource (implemented using TableScan) Did you use the DefaultSource from hbase-spark module in hbase master branch ? If you wrote your own, mind sharing related code ? Thanks On Thu, Jun 9, 2016 at 2:53 AM, raaggarw wrote: > Hi,

Re: Write Ahead Log

2016-06-08 Thread Ted Yu
There was a minor typo in the name of the config: spark.streaming.receiver.writeAheadLog.enable Yes, it only applies to Streaming. On Wed, Jun 8, 2016 at 3:14 PM, Mohit Anchlia wrote: > Is something similar to park.streaming.receiver.writeAheadLog.enable > available on

Re: comparaing row in pyspark data frame

2016-06-08 Thread Ted Yu
Do you mean returning col3 and 0.4 for the example row below ? > On Jun 8, 2016, at 5:05 AM, pseudo oduesp wrote: > > Hi, > how we can compare multiples columns in datframe i mean > > if df it s dataframe like that : > >df.col1 | df.col2 |

Re: Apache design patterns

2016-06-07 Thread Ted Yu
I think this is the correct forum. Please describe your case. > On Jun 7, 2016, at 8:33 PM, Francois Le Roux wrote: > > HI folks, I have been working through the available online Apache spark > tutorials and I am stuck with a scenario that i would like to solve in

  1   2   3   4   5   6   7   8   9   10   >