from:"Bryan Jeffrey"

Re: Query around Spark Checkpoints

2020-09-29 Thread Bryan Jeffrey

Jungtaek, How would you contrast stateful streaming with checkpoint vs. the idea of writing updates to a Delta Lake table, and then using the Delta Lake table as a streaming source for our state stream? Thank you, Bryan On Mon, Sep 28, 2020 at 9:50 AM Debabrata Ghosh wrote: > Thank You

Re: Metrics Problem

2020-07-10 Thread Bryan Jeffrey

On Thu, Jul 2, 2020 at 2:33 PM Bryan Jeffrey wrote: > Srinivas, > > I finally broke a little bit of time free to look at this issue. I > reduced the scope of my ambitions and simply cloned a the ConsoleSink and > ConsoleReporter class. After doing so I can see the original

Re: Metrics Problem

2020-06-28 Thread Bryan Jeffrey

Srinivas, Interestingly, I did have the metrics jar packaged as part of my main jar. It worked well both on driver and locally, but not on executors. Regards, Bryan Jeffrey Get Outlook for Android<https://aka.ms/ghei36> From: Srinivas V Sent: Saturday

Re: Metrics Problem

2020-06-26 Thread Bryan Jeffrey

providers which appear to work. Regards, Bryan Get Outlook for Android<https://aka.ms/ghei36> From: Srinivas V Sent: Friday, June 26, 2020 9:47:52 PM To: Bryan Jeffrey Cc: user Subject: Re: Metrics Problem It should work when you are giving hdfs path a

Re: Metrics Problem

2020-06-26 Thread Bryan Jeffrey

It may be helpful to note that I'm running in Yarn cluster mode. My goal is to avoid having to manually distribute the JAR to all of the various nodes as this makes versioning deployments difficult. On Thu, Jun 25, 2020 at 5:32 PM Bryan Jeffrey wrote: > Hello. > > I am running Spark

Metrics Problem

2020-06-25 Thread Bryan Jeffrey

Hello. I am running Spark 2.4.4. I have implemented a custom metrics producer. It works well when I run locally, or specify the metrics producer only for the driver. When I ask for executor metrics I run into ClassNotFoundExceptions *Is it possible to pass a metrics JAR via --jars? If so what

Custom Metrics

2020-06-18 Thread Bryan Jeffrey

. Is there a suggested mechanism? Thank you, Bryan Jeffrey

Data Source - State (SPARK-28190)

2020-03-30 Thread Bryan Jeffrey

to be potentially addressed by your ticket SPARK-28190 - "Data Source - State". I see little activity on this ticket. Can you help me to understand where this feature currently stands? Thank you, Bryan Jeffrey

Re: Structured Streaming: mapGroupsWithState UDT serialization does not work

2020-02-29 Thread Bryan Jeffrey

like to come up to speed on the right way to validate changes and perhaps contribute myself. Regards, Bryan Jeffrey On Sat, Feb 29, 2020 at 9:52 AM Jungtaek Lim wrote: > Forgot to mention - it only occurs the SQL type of UDT is having fixed > length. If the UDT is used to represent c

Fwd: Structured Streaming: mapGroupsWithState UDT serialization does not work

2020-02-28 Thread Bryan Jeffrey

github.com/apache/spark/blob/master/sql/catalyst/src/main/java/org/apache/spark/sql/types/SQLUserDefinedType.java>` > on the class definition. You dont seem to have done it, maybe thats the > reason? > > I would debug by printing the values in the serialize/deserialize methods, &g

Re: Structured Streaming: mapGroupsWithState UDT serialization does not work

2020-02-28 Thread Bryan Jeffrey

Perfect. I'll give this a shot and report back. Get Outlook for Android<https://aka.ms/ghei36> From: Tathagata Das Sent: Friday, February 28, 2020 6:23:07 PM To: Bryan Jeffrey Cc: user Subject: Re: Structured Streaming: mapGroupsWithState UDT seriali

Re: Structured Streaming: mapGroupsWithState UDT serialization does not work

2020-02-28 Thread Bryan Jeffrey

serialization bug also causes issues outside of stateful streaming, as when executing a simple group by. Regards, Bryan Jeffrey Get Outlook for Android<https://aka.ms/ghei36> From: Tathagata Das Sent: Friday, February 28, 2020 4:56:07 PM To: Bryan Jeffr

Structured Streaming: mapGroupsWithState UDT serialization does not work

2020-02-28 Thread Bryan Jeffrey

FooWithDate = FooWithDate(b.date, a.s + b.s, a.i + b.i) } The test output shows the invalid date: org.scalatest.exceptions.TestFailedException: Array(FooWithDate(2021-02-02T19:26:23.374Z,Foo,1), FooWithDate(2021-02-02T19:26:23.374Z,FooFoo,6)) did not contain the same elements as Array(FooWithDate(2020-01-02T03:04:05.006Z,Foo,1), FooWithDate(2020-01-02T03:04:05.006Z,FooFoo,6)) Is this something folks have encountered before? Thank you, Bryan Jeffrey

Accumulator v2

2020-01-21 Thread Bryan Jeffrey

, Bryan Jeffrey

Structured Streaming & Enrichment Broadcasts

2019-11-18 Thread Bryan Jeffrey

read an external data source and do a fast lookup for a streaming input. One option appears to be to do a broadcast left outer join? In the past this mechanism has been less easy to performance tune than doing an explicit broadcast and lookup. Regards, Bryan Jeffrey

Driver - Stops Scheduling Streaming Jobs

2019-08-27 Thread Bryan Jeffrey

anything pop out. Has anyone else seen this behavior? Any thoughts on debugging? Regards, Bryan Jeffrey

Driver OOM does not shut down Spark Context

2019-01-31 Thread Bryan Jeffrey

the Streaming application never terminated. No new batches were started. As a result my job did not process data for some period of time (until our ancillary monitoring noticed the issue). *Ask: What can we do to ensure that the driver is shut down when this type of exception occurs?* Regards, Bryan

Re: ConcurrentModificationExceptions with CachedKafkaConsumers

2018-08-31 Thread Bryan Jeffrey

Cody, Yes - I was able to verify that I am not seeing duplicate calls to createDirectStream. If the spark-streaming-kafka-0-10 will work on a 2.3 cluster I can go ahead and give that a shot. Regards, Bryan Jeffrey On Fri, Aug 31, 2018 at 11:56 AM Cody Koeninger wrote: > Just to be 100% s

Re: ConcurrentModificationExceptions with CachedKafkaConsumers

2018-08-31 Thread Bryan Jeffrey

. Regards, Bryan Jeffrey On Thu, Aug 30, 2018 at 2:56 PM Cody Koeninger wrote: > I doubt that fix will get backported to 2.3.x > > Are you able to test against master? 2.4 with the fix you linked to > is likely to hit code freeze soon. > > From a quick look at your code

ConcurrentModificationExceptions with CachedKafkaConsumers

2018-08-30 Thread Bryan Jeffrey

ructType(Array(StructField("value", BinaryType df.selectExpr("CAST('' AS STRING)", "value") .write .format("kafka") .option("kafka.bootstrap.servers", getBrokersToUse(brokers)) .option("compression.type", "gzip") .option("retries", "3") .option("topic", topic) .save() } Regards, Bryan Jeffrey

Heap Memory in Spark 2.3.0

2018-07-16 Thread Bryan Jeffrey

ocation in Spark 2.3.0 that would cause these issues? Thank you for any help you can provide. Regards, Bryan Jeffrey

Re: Kafka Offset Storage: Fetching Offsets

2018-06-14 Thread Bryan Jeffrey

:01:01 PM To: Bryan Jeffrey Cc: user Subject: Re: Kafka Offset Storage: Fetching Offsets Offsets are loaded when you instantiate an org.apache.kafka.clients.consumer.KafkaConsumer, subscribe, and poll. There's not an explicit api for it. Have you looked at the output of kafka-consumer-gro

Re: Kafka Offset Storage: Fetching Offsets

2018-06-14 Thread Bryan Jeffrey

Cody, Where is that called in the driver? The only call I see from Subscribe is to load the offset from checkpoint. Get Outlook for Android<https://aka.ms/ghei36> From: Cody Koeninger Sent: Thursday, June 14, 2018 4:24:58 PM To: Bryan Jeffrey Cc: user S

Re: Kafka Offset Storage: Fetching Offsets

2018-06-14 Thread Bryan Jeffrey

:31 PM To: Bryan Jeffrey Cc: user Subject: Re: Kafka Offset Storage: Fetching Offsets The expectation is that you shouldn't have to manually load offsets from kafka, because the underlying kafka consumer on the driver will start at the offsets associated with the given group id. That's the behavior

Kafka Offset Storage: Fetching Offsets

2018-06-14 Thread Bryan Jeffrey

reaming Kafka library. How is this expected to work? Is there an example of saving the offsets to Kafka and then loading them from Kafka? Regards, Bryan Jeffrey

Re: [Spark 2.x Core] Adding to ArrayList inside rdd.foreach()

2018-04-07 Thread Bryan Jeffrey

You can just call rdd.flatMap(_._2).collect Get Outlook for Android From: klrmowse Sent: Saturday, April 7, 2018 1:29:34 PM To: user@spark.apache.org Subject: Re: [Spark 2.x Core] Adding to ArrayList inside

Re: Return statements aren't allowed in Spark closures

2018-02-21 Thread Bryan Jeffrey

Lian, You're writing Scala. Just remove the 'return'. No need for it in Scala. Get Outlook for Android From: Lian Jiang Sent: Wednesday, February 21, 2018 4:16:08 PM To: user Subject: Return statements aren't

Stopping a Spark Streaming Context gracefully

2017-11-07 Thread Bryan Jeffrey

. I looked in Jira and did not see an open issue. Is this a known problem? If not I'll open a bug. Regards, Bryan Jeffrey

Re: about broadcast join of base table in spark sql

2017-06-30 Thread Bryan Jeffrey

Hello. If you want to allow broadcast join with larger broadcasts you can set spark.sql.autoBroadcastJoinThreshold to a higher value. This will cause the plan to allow join despite 'A' being larger than the default threshold. Get Outlook for Android From: paleyl Sent:

Re: Question about Parallel Stages in Spark

2017-06-27 Thread Bryan Jeffrey

A and B. Jeffrey' code did not cause two submit. ---Original---From: "Pralabh Kumar"<pralabhku...@gmail.com>Date: 2017/6/27 12:09:27To: "萝卜丝炒饭"<1427357...@qq.com>;Cc: "user"<user@spark.apache.org>;"satishl"<satish.la...@gmail.com>;&

Re: Question about Parallel Stages in Spark

2017-06-26 Thread Bryan Jeffrey

Hello. The driver is running the individual operations in series, but each operation is parallelized internally. If you want them run in parallel you need to provide the driver a mechanism to thread the job scheduling out: val rdd1 = sc.parallelize(1 to 10) val rdd2 = sc.parallelize(1 to

Re: Broadcasts & Storage Memory

2017-06-21 Thread Bryan Jeffrey

of that storage memory fraction. Bryan Jeffrey Get Outlook for Android On Wed, Jun 21, 2017 at 6:48 PM -0400, "satish lalam" <satish.la...@gmail.com> wrote: My understanding is - it from storageFraction. Here cached blocks are immune to eviction - so bot

Broadcasts & Storage Memory

2017-06-21 Thread Bryan Jeffrey

the storage memory for cached RDDs. You end up with executor memory that looks like the following: All memory: 0-100 Spark memory: 0-75 RDD Storage: 0-37 Other Spark: 38-75 Other Reserved: 76-100 Where do broadcast variables fall into the mix? Regards, Bryan Jeffrey

Re: how many topics spark streaming can handle

2017-06-19 Thread Bryan Jeffrey

Hello Ashok, We're consuming from more than 10 topics in some Spark streaming applications. Topic management is a concern (what is read from where, etc), but I have seen no issues from Spark itself. Regards, Bryan Jeffrey Get Outlook for Android On Mon, Jun 19, 2017

Re: Scala, Python or Java for Spark programming

2017-06-07 Thread Bryan Jeffrey

provides a lot of similar features, but the amount of typing required to set down a small function is excessive at best! Regards, Bryan Jeffrey On Wed, Jun 7, 2017 at 12:51 PM, Jörn Franke <jornfra...@gmail.com> wrote: > I think this is a religious question ;-) > Java is often un

Re: Is there a way to do conditional group by in spark 2.1.1?

2017-06-03 Thread Bryan Jeffrey

You should be able to project a new column that is your group column. Then you can group on the projected column. Get Outlook for Android On Sat, Jun 3, 2017 at 6:26 PM -0400, "upendra 1991" wrote: Use a function Sent from Yahoo Mail on

Re: Convert camelCase to snake_case when saving Dataframe/Dataset to parquet?

2017-05-22 Thread Bryan Jeffrey

Mike, I have code to do that. I'll share it tomorrow. Get Outlook for Android On Mon, May 22, 2017 at 4:53 PM -0400, "Mike Wheeler" wrote: Hi Spark User, For Scala case class, we usually use camelCase (carType) for member fields. However,

Re: [RDDs and Dataframes] Equivalent expressions for RDD API

2017-03-04 Thread bryan . jeffrey

Rdd operation: rdd.map(x => (word, count)).reduceByKey(_+_) Get Outlook for Android On Sat, Mar 4, 2017 at 8:59 AM -0500, "Old-School" wrote: Hi, I want to perform some simple transformations and check the execution time, under

Re: streaming-kafka-0-8-integration (direct approach) and monitoring

2017-02-14 Thread bryan . jeffrey

Mohammad, We store our offsets in Cassandra, and use that for tracking. This solved a few issues for us, as it provides a good persistence mechanism even when you're reading from multiple clusters. Bryan Jeffrey Get Outlook for Android On Tue, Feb 14, 2017 at 7:03 PM

Re: How to specify "verbose GC" in Spark submit?

2017-02-06 Thread Bryan Jeffrey

ify --conf "spark.executor.extraJavaOptions=-XX:+PrintFlagsFinal -verbose:gc", etc. Bryan Jeffrey On Mon, Feb 6, 2017 at 8:02 AM, Md. Rezaul Karim < rezaul.ka...@insight-centre.org> wrote: > Dear All, > > Is there any way to specify verbose GC -i.e. “-verbose:gc > -XX:

Re: Streaming Batch Oddities

2016-12-13 Thread Bryan Jeffrey

All, Any thoughts? I can run another couple of experiments to try to narrow the problem. The total data volume in the repartition is around 60GB / batch. Regards, Bryan Jeffrey On Tue, Dec 13, 2016 at 12:11 PM, Bryan Jeffrey <bryan.jeff...@gmail.com> wrote: > Hello. > > I

Streaming Batch Oddities

2016-12-13 Thread Bryan Jeffrey

which calls coalesce (shuffle=true), which creates a new ShuffledRDD with a HashPartitioner. These calls appear functionally equivelent - I am having trouble coming up with a justification for the significant performance differences between calls. Help? Regards, Bryan Jeffrey

Event Log Compression

2016-07-26 Thread Bryan Jeffrey

? Thank you, Bryan Jeffrey

Spark 2.0

2016-07-25 Thread Bryan Jeffrey

be willing to go fix it myself). Should I just create a ticket? Thank you, Bryan Jeffrey

Re: Kafka Exceptions

2016-06-13 Thread Bryan Jeffrey

Cody, We already set the maxRetries. We're still seeing issue - when leader is shifted, for example, it does not appear that direct stream reader correctly handles this. We're running 1.6.1. Bryan Jeffrey On Mon, Jun 13, 2016 at 10:37 AM, Cody Koeninger <c...@koeninger.org> wrote:

Kafka Exceptions

2016-06-13 Thread Bryan Jeffrey

you, Bryan Jeffrey

Re: Dataset - reduceByKey

2016-06-07 Thread Bryan Jeffrey

All, Thank you for the replies. It seems as though the Dataset API is still far behind the RDD API. This is unfortunate as the Dataset API potentially provides a number of performance benefits. I will move to using it in a more limited set of cases for the moment. Thank you! Bryan Jeffrey

Re: Dataset - reduceByKey

2016-06-07 Thread Bryan Jeffrey

It would also be nice if there was a better example of joining two Datasets. I am looking at the documentation here: http://spark.apache.org/docs/latest/sql-programming-guide.html. It seems a little bit sparse - is there a better documentation source? Regards, Bryan Jeffrey On Tue, Jun 7, 2016

Dataset - reduceByKey

2016-06-07 Thread Bryan Jeffrey

Hello. I am looking at the option of moving RDD based operations to Dataset based operations. We are calling 'reduceByKey' on some pair RDDs we have. What would the equivalent be in the Dataset interface - I do not see a simple reduceByKey replacement. Regards, Bryan Jeffrey

Streaming application slows over time

2016-05-09 Thread Bryan Jeffrey

for or known bugs in similar instances? Regards, Bryan Jeffrey

Issues with Long Running Streaming Application

2016-04-25 Thread Bryan Jeffrey

, Bryan Jeffrey

Re: Spark 1.6.1. How to prevent serialization of KafkaProducer

2016-04-21 Thread Bryan Jeffrey

ns = kafkaWritePartitions) detectionWriter.write(dataToWriteToKafka) Hope that helps! Bryan Jeffrey On Thu, Apr 21, 2016 at 2:08 PM, Alexander Gallego <agall...@concord.io> wrote: > Thanks Ted. > > KafkaWordCount (producer) does not operate on a DStream[T

Re: ERROR ArrayBuffer(java.nio.channels.ClosedChannelException

2016-03-19 Thread Bryan Jeffrey

Cody et. al, I am seeing a similar error. I've increased the number of retries. Once I've got a job up and running I'm seeing it retry correctly. However, I am having trouble getting the job started - number of retries does not seem to help with startup behavior. Thoughts? Regards, Bryan

Re: OOM Exception in my spark streaming application

2016-03-14 Thread Bryan Jeffrey

Steve & Adam, I would be interesting in hearing the outcome here as well. I am seeing some similar issues in my 1.4.1 pipeline, using stateful functions (reduceByKeyAndWindow and updateStateByKey). Regards, Bryan Jeffrey On Mon, Mar 14, 2016 at 6:45 AM, Steve Loughran <ste...@hortonwo

Re: Spark job for Reading time series data from Cassandra

2016-03-10 Thread Bryan Jeffrey

Prateek, I believe that one task is created per Cassandra partition. How is your data partitioned? Regards, Bryan Jeffrey On Thu, Mar 10, 2016 at 10:36 AM, Prateek . <prat...@aricent.com> wrote: > Hi, > > > > I have a Spark Batch job for reading timeseries data from

Suggested Method to Write to Kafka

2016-03-01 Thread Bryan Jeffrey

Hello. Is there a suggested method and/or some example code to write results from a Spark streaming job back to Kafka? I'm using Scala and Spark 1.4.1. Regards, Bryan Jeffrey

Re: Spark with .NET

2016-02-09 Thread Bryan Jeffrey

Arko, Check this out: https://github.com/Microsoft/SparkCLR This is a Microsoft authored C# language binding for Spark. Regards, Bryan Jeffrey On Tue, Feb 9, 2016 at 3:13 PM, Arko Provo Mukherjee < arkoprovomukher...@gmail.com> wrote: > Doesn't seem to be supported, but

Re: Access batch statistics in Spark Streaming

2016-02-08 Thread Bryan Jeffrey

>From within a Spark job you can use a Periodic Listener: ssc.addStreamingListener(PeriodicStatisticsListener(Seconds(60))) class PeriodicStatisticsListener(timePeriod: Duration) extends StreamingListener { private val logger = LoggerFactory.getLogger("Application") override def

Error w/ Invertable ReduceByKeyAndWindow

2016-02-01 Thread Bryan Jeffrey

I am sure we're doing consistent hashing. The 'reduceAdd' function is adding to a map. The 'inverseReduceFunction' is subtracting from the map. The filter function is removing items where the number of entries in the map is zero. Has anyone seen this error before? Regards, Bryan Jeffrey

Re: Error w/ Invertable ReduceByKeyAndWindow

2016-02-01 Thread Bryan Jeffrey

Excuse me - I should have mentioned: I am running Spark 1.4.1, Scala 2.11. I am running in streaming mode receiving data from Kafka. Regards, Bryan Jeffrey On Mon, Feb 1, 2016 at 9:19 PM, Bryan Jeffrey <bryan.jeff...@gmail.com> wrote: > Hello. > > I have a reduceByKeyAnd

Re: Hive error after update from 1.4.1 to 1.5.2

2015-12-16 Thread Bryan Jeffrey

let me know what was the > resolution? > > Thanks, > Ashwin > > On Fri, Nov 20, 2015 at 12:07 PM, Bryan Jeffrey <bryan.jeff...@gmail.com> > wrote: > >> Nevermind. I had a library dependency that still had the old Spark >> version. >> >> On Fr

DateTime Support - Hive Parquet

2015-11-23 Thread Bryan Jeffrey

with 1.5.2 - however, I am still seeing the associated errors. Is there a bug I can follow to determine when DateTime will be supported for Parquet? Regards, Bryan Jeffrey

Hive error after update from 1.4.1 to 1.5.2

2015-11-20 Thread Bryan Jeffrey

Hello. I'm seeing an error creating a Hive Context moving from Spark 1.4.1 to 1.5.2. Has anyone seen this issue? I'm invoking the following: new HiveContext(sc) // sc is a Spark Context I am seeing the following error: SLF4J: Class path contains multiple SLF4J bindings. SLF4J: Found binding

Re: Hive error after update from 1.4.1 to 1.5.2

2015-11-20 Thread Bryan Jeffrey

The 1.5.2 Spark was compiled using the following options: mvn -Dhadoop.version=2.6.1 -Dscala-2.11 -DskipTests -Pyarn -Phive -Phive-thriftserver clean package Regards, Bryan Jeffrey On Fri, Nov 20, 2015 at 2:13 PM, Bryan Jeffrey <bryan.jeff...@gmail.com> wrote: > Hello. > > I'm

Re: Hive error after update from 1.4.1 to 1.5.2

2015-11-20 Thread Bryan Jeffrey

Nevermind. I had a library dependency that still had the old Spark version. On Fri, Nov 20, 2015 at 2:14 PM, Bryan Jeffrey <bryan.jeff...@gmail.com> wrote: > The 1.5.2 Spark was compiled using the following options: mvn > -Dhadoop.version=2.6.1 -Dscala-2.11 -DskipTests -Pyarn -Ph

Re: Cassandra via SparkSQL/Hive JDBC

2015-11-12 Thread Bryan Jeffrey

master spark://10.0.0.4:7077 --packages com.datastax.spark:spark-cassandra-connector_2.11:1.5.0-M1 --hiveconf "spark.cores.max=2" --hiveconf "spark.executor.memory=2g" Do I perhaps need to include an additional library to do the default conversion? Regards, Bryan Jeffrey On Th

Re: Cassandra via SparkSQL/Hive JDBC

2015-11-12 Thread Bryan Jeffrey

Yes, I do - I found your example of doing that later in your slides. Thank you for your help! On Thu, Nov 12, 2015 at 12:20 PM, Mohammed Guller <moham...@glassbeam.com> wrote: > Did you mean Hive or Spark SQL JDBC/ODBC server? > > > > Mohammed > > > > *From:*

Re: Cassandra via SparkSQL/Hive JDBC

2015-11-12 Thread Bryan Jeffrey

TIONS ( keyspace "c2", table "detectionresult" ); ]Error: java.io.IOException: Failed to open native connection to Cassandra at {10.0.0.4}:9042 (state=,code=0) This seems to be connecting to local host regardless of the value I set spark.cassandra.connection.host to. Regards,

Re: Cassandra via SparkSQL/Hive JDBC

2015-11-12 Thread Bryan Jeffrey

Mohammed, That is great. It looks like a perfect scenario. Would I be able to make the created DF queryable over the Hive JDBC/ODBC server? Regards, Bryan Jeffrey On Wed, Nov 11, 2015 at 9:34 PM, Mohammed Guller <moham...@glassbeam.com> wrote: > Short answer: yes. > > > >

Re: Cassandra via SparkSQL/Hive JDBC

2015-11-12 Thread Bryan Jeffrey

Answer: In beeline run the following: SET spark.cassandra.connection.host="10.0.0.10" On Thu, Nov 12, 2015 at 1:13 PM, Bryan Jeffrey <bryan.jeff...@gmail.com> wrote: > Mohammed, > > While you're willing to answer questions, is there a trick to getting the > H

Spark Dynamic Partitioning Bug

2015-11-05 Thread Bryan Jeffrey

the manually calculated fields are correct. However, the dynamically calculated (string) partition for idAndSource is a random field from within my case class. I've duplicated this with several other classes and have seen the same result (I use this example because it's very simple). Any idea if this is a known bug? Is there a workaround? Regards, Bryan Jeffrey

Re: Allow multiple SparkContexts in Unit Testing

2015-11-04 Thread Bryan Jeffrey

SparkConf().set("spark.driver.allowMultipleContexts", "true").setAppName(appName).setMaster(master) new StreamingContext(conf, Seconds(seconds)) } Regards, Bryan Jeffrey On Wed, Nov 4, 2015 at 9:49 AM, Ted Yu <yuzhih...@gmail.com> wrote: > Are you trying to sp

SparkSQL implicit conversion on insert

2015-11-02 Thread Bryan Jeffrey

conversion prior to insertion? Regards, Bryan Jeffrey

Re: Spark -- Writing to Partitioned Persistent Table

2015-10-30 Thread Bryan Jeffrey

Deenar, This worked perfectly - I moved to SQL Server and things are working well. Regards, Bryan Jeffrey On Thu, Oct 29, 2015 at 8:14 AM, Deenar Toraskar <deenar.toras...@gmail.com> wrote: > Hi Bryan > > For your use case you don't need to have multiple metastores. The defa

Hive Version

2015-10-28 Thread Bryan Jeffrey

of the Spark documentation, but do not see version specified anywhere - it would be a good addition. Thank you, Bryan Jeffrey

Spark -- Writing to Partitioned Persistent Table

2015-10-28 Thread Bryan Jeffrey

essible (and not partitioned). Is there a straightforward way to write to partitioned tables using Spark SQL? I understand that the read performance for partitioned data is far better - are there other performance improvements that might be better to use instead of partitioning? Regards, Bryan Jeffrey

Re: Spark -- Writing to Partitioned Persistent Table

2015-10-28 Thread Bryan Jeffrey

issues (3) When partitioning without maps I see frequent out of memory issues I'll update this email when I've got a more concrete example of problems. Regards, Bryan Jeffrey On Wed, Oct 28, 2015 at 1:33 PM, Susan Zhang <suchenz...@gmail.com> wrote: > Have you tried partitionBy? >

Re: Spark -- Writing to Partitioned Persistent Table

2015-10-28 Thread Bryan Jeffrey

every time. Is this a known issue? Is there a workaround? Regards, Bryan Jeffrey On Wed, Oct 28, 2015 at 3:13 PM, Bryan Jeffrey <bryan.jeff...@gmail.com> wrote: > Susan, > > I did give that a shot -- I'm seeing a number of oddities: > > (1) 'Partition By' appears only accepts

Re: Spark -- Writing to Partitioned Persistent Table

2015-10-28 Thread Bryan Jeffrey

MetadataTypedColumnsetSerDe | | InputFormat: org.apache.hadoop.mapred.SequenceFileInputFormat | | OutputFormat: org.apache.hadoop.hive.ql.io.HiveSequenceFileOutputFormat | This seems like a pretty big bug associated with persistent tables. Am I missing a step somewhere? Thank you, Bryan Jeffrey On Wed, Oct 28, 2015 at 4:10

Spark SQL Persistent Table - joda DateTime Compatability

2015-10-27 Thread Bryan Jeffrey

me to a persistent Hive table accomplished? Has anyone else run into the same issue? Regards, Bryan Jeffrey

Re: Error Compiling Spark 1.4.1 w/ Scala 2.11 & Hive Support

2015-10-26 Thread Bryan Jeffrey

All, The error resolved to a bad version of jline pulling from Maven. The jline version is defined as 'scala.version' -- the 2.11 version does not exist in maven. Instead the following should be used: org.scala-lang jline 2.11.0-M3 Regards, Bryan Jeffrey

Error Compiling Spark 1.4.1 w/ Scala 2.11 & Hive Support

2015-10-26 Thread Bryan Jeffrey

All, I'm seeing the following error compiling Spark 1.4.1 w/ Scala 2.11 & Hive support. Any ideas? mvn -Dhadoop.version=2.6.1 -Dscala-2.11 -DskipTests -Pyarn -Phive -Phive-thriftserver package [INFO] Spark Project Parent POM .. SUCCESS [4.124s] [INFO] Spark Launcher

Re: Example of updateStateByKey with initial RDD?

2015-10-08 Thread Bryan Jeffrey

t the method calls, the function that is called for appears to be the same. I was hoping an example might shed some light on the issue. Regards, Bryan Jeffrey On Thu, Oct 8, 2015 at 7:04 AM, Aniket Bhatnagar <aniket.bhatna...@gmail.com > wrote: > Here is an example: > > val

Re: Example of updateStateByKey with initial RDD?

2015-10-08 Thread Bryan Jeffrey

= initialRDD) > counts.print() > > Thanks, > Aniket > > > On Thu, Oct 8, 2015 at 5:48 PM Bryan Jeffrey <bryan.jeff...@gmail.com> > wrote: > >> Aniket, >> >> Thank you for the example - but that's not quite what I'm looking for. >

Re: Weird worker usage

2015-09-28 Thread Bryan Jeffrey

Nukunj, No, I'm not calling set w/ master at all. This ended up being a foolish configuration problem with my slaves file. Regards, Bryan Jeffrey On Fri, Sep 25, 2015 at 11:20 PM, N B <nb.nos...@gmail.com> wrote: > Bryan, > > By any chance, are you calling SparkConf.s

Re: Weird worker usage

2015-09-25 Thread Bryan Jeffrey

parkcheckpoint --broker kafkaBroker:9092 --topic test --numStreams 9 --threadParallelism 9 Even when I put a long-running job in the queue, none of the other nodes are anything but idle. Am I missing something obvious? Regards, Bryan Jeffrey On Fri, Sep 25, 2015 at 8:28 AM, Akhil Das <a

Re: Weird worker usage

2015-09-25 Thread Bryan Jeffrey

45 INFO SparkContext: Running Spark version 1.4.1 15/09/25 16:45:45 INFO SparkContext: Spark configuration: spark.app.name=MainClass spark.default.parallelism=6 spark.driver.supervise=true spark.jars=file:/tmp/OinkSpark-1.0-SNAPSHOT-jar-with-dependencies.jar spark.logConf=true spark.master=local[*] spark.rpc.askTimeout=10 spark.streaming.receiver.maxRate=500 As you can see, despite -Dmaster=spark://sparkserver:7077, the streaming context still registers the master as local[*]. Any idea why? Thank you, Bryan Jeffrey

Yarn Shutting Down Spark Processing

2015-09-22 Thread Bryan Jeffrey

need to change to allow Yarn to initialize Spark streaming (vs. batch) jobs? Thank you, Bryan Jeffrey

Suggested Method for Execution of Periodic Actions

2015-09-16 Thread Bryan Jeffrey

counts to a database. Is there a built in mechanism or established pattern to execute periodic jobs in spark streaming? Regards, Bryan Jeffrey

Problems with Local Checkpoints

2015-09-09 Thread Bryan Jeffrey

en something similar? Regards, Bryan Jeffrey

Java vs. Scala for Spark

2015-09-08 Thread Bryan Jeffrey

for Spark dev in an enterprise environment? What was the outcome? Regards, Bryan Jeffrey

Re: Java vs. Scala for Spark

2015-09-08 Thread Bryan Jeffrey

Thank you for the quick responses. It's useful to have some insight from folks already extensively using Spark. Regards, Bryan Jeffrey On Tue, Sep 8, 2015 at 10:28 AM, Sean Owen <so...@cloudera.com> wrote: > Why would Scala vs Java performance be different Ted? Relatively &

Getting Started with Spark

2015-09-08 Thread Bryan Jeffrey

Hello. We're getting started with Spark Streaming. We're working to build some unit/acceptance testing around functions that consume DStreams. The current method for creating DStreams is to populate the data by creating an InputDStream: val input = Array(TestDataFactory.CreateEvent(123

93 matches

Mail list logo