from:"Gerard Maas"

Re: [StructuredStreaming] HDFSBackedStateStoreProvider is leaking .crc files.

2019-06-12 Thread Gerard Maas

Ooops - linked the wrong JIRA ticket: (that other one is related) https://issues.apache.org/jira/browse/SPARK-28025 On Wed, Jun 12, 2019 at 1:21 PM Gerard Maas wrote: > Hi! > I would like to socialize this issue we are currently facing: > The Structured Streaming default CheckpointFi

[StructuredStreaming] HDFSBackedStateStoreProvider is leaking .crc files.

2019-06-12 Thread Gerard Maas

Hi! I would like to socialize this issue we are currently facing: The Structured Streaming default CheckpointFileManager leaks .crc files by leaving them behind after users of this class (like HDFSBackedStateStoreProvider) apply their cleanup methods. This results in an unbounded creation of tiny

[Structured Streaming] File source, Parquet format: use of the mergeSchema option.

2018-04-11 Thread Gerard Maas

Hi, I'm looking into the Parquet format support for the File source in Structured Streaming. The docs mention the use of the option 'mergeSchema' to merge the schemas of the part files found.[1] What would be the practical use of that in a streaming context? In its batch counterpart,

[Structured Streaming] OOM on ConsoleSink with large inputs

2017-08-11 Thread Gerard Maas

Devs, While investigating another issue, I came across this OOM error when using the Console Sink with any source that can be larger than the available driver memory. In my case, I was using the File source and I had a 14G file in the monitored dir. I traced back the issue to a `df.collect` in

Re: Handling questions in the mailing lists

2016-11-09 Thread Gerard Maas

Great discussion. Glad to see it happening and lucky to have seen it on the mailing list due to its high volume. I had this same conversation with Patrick Wendell few Spark Summits ago. At the time, SO was not even listed as a resource and the idea was to make it the primary "go-to" place for

Re: Can we remove private[spark] from Metrics Source and SInk traits?

2016-03-19 Thread Gerard Maas

+1 On Mar 19, 2016 08:33, "Pete Robbins" wrote: > This seems to me to be unnecessarily restrictive. These are very useful > extension points for adding 3rd party sources and sinks. > > I intend to make an Elasticsearch sink available on spark-packages but > this will require

Re: Time is ugly in Spark Streaming....

2015-06-26 Thread Gerard Maas

Are you sharing the SimpleDateFormat instance? This looks a lot more like the non-thread-safe behaviour of SimpleDateFormat (that has claimed many unsuspecting victims over the years), than any 'ugly' Spark Streaming. Try writing the timestamps in millis to Kafka and compare. -kr, Gerard. On

Re: Stages with non-arithmetic numbering Timing metrics in event logs

2015-06-11 Thread Gerard Maas

Kay, Excellent write-up. This should be preserved for reference somewhere searchable. -Gerard. On Fri, Jun 12, 2015 at 1:19 AM, Kay Ousterhout k...@eecs.berkeley.edu wrote: Here’s how the shuffle works. This explains what happens for a single task; this will happen in parallel for each

Re: [Streaming] Configure executor logging on Mesos

2015-06-01 Thread Gerard Maas

the spark.executor.uri (or a another one) can take more than one downloadable path. my.2¢ andy On Fri, May 29, 2015 at 5:09 PM Gerard Maas gerard.m...@gmail.com wrote: Hi Tim, Thanks for the info. We (Andy Petrella and myself) have been diving a bit deeper into this log config: The log

Re: Registering custom metrics

2015-01-08 Thread Gerard Maas

) bytes } .saveAsTextFile(text) Is there a way to achieve this with the MetricSystem? ᐧ On Mon, Jan 5, 2015 at 10:24 AM, Gerard Maas gerard.m...@gmail.com wrote: Hi, Yes, I managed to create a register custom metrics by creating an implementation

Re: Registering custom metrics

2015-01-05 Thread Gerard Maas

Hi, Yes, I managed to create a register custom metrics by creating an implementation of org.apache.spark.metrics.source.Source and registering it to the metrics subsystem. Source is [Spark] private, so you need to create it under a org.apache.spark package. In my case, I'm dealing with Spark

Tuning Spark Streaming jobs

2014-12-22 Thread Gerard Maas

Hi, After facing issues with the performance of some of our Spark Streaming jobs, we invested quite some effort figuring out the factors that affect the performance characteristics of a Streaming job. We defined an empirical model that helps us reason about Streaming jobs and applied it to tune

Re: Tuning Spark Streaming jobs

2014-12-22 Thread Gerard Maas

mode? I'm making changes to the spark mesos scheduler and I think we can propose a best way to achieve what you mentioned. Tim Sent from my iPhone On Dec 22, 2014, at 8:33 AM, Gerard Maas gerard.m...@gmail.com wrote: Hi, After facing issues with the performance of some of our Spark

Understanding reported times on the Spark UI [+ Streaming]

2014-12-08 Thread Gerard Maas

Hi, I'm confused about the Stage times reported on the Spark-UI (Spark 1.1.0) for an Spark-Streaming job. I'm hoping somebody can shine some light on it: Let's do this with an example: On the /stages page, stage # 232 is reported to have lasted 18 seconds: 232runJob at RDDFunctions.scala:23

Re: Spark Streaming Metrics

2014-11-21 Thread Gerard Maas

Looks like metrics are not a hot topic to discuss - yet so important to sleep well when jobs are running in production. I've created Spark-4537 https://issues.apache.org/jira/browse/SPARK-4537 to track this issue. -kr, Gerard. On Thu, Nov 20, 2014 at 9:25 PM, Gerard Maas gerard.m...@gmail.com

Spark Streaming Metrics

2014-11-20 Thread Gerard Maas

As the Spark Streaming tuning guide indicates, the key indicators of a healthy streaming job are: - Processing Time - Total Delay The Spark UI page for the Streaming job [1] shows these two indicators but the metrics source for Spark Streaming (StreamingSource.scala) [2] does not. Any reasons

Registering custom metrics

2014-10-30 Thread Gerard Maas

vHi, I've been exploring the metrics exposed by Spark and I'm wondering whether there's a way to register job-specific metrics that could be exposed through the existing metrics system. Would there be an example somewhere? BTW, documentation about how the metrics work could be improved. I

Using case classes as keys does not seem to work.

2014-07-22 Thread Gerard Maas

Using a case class as a key doesn't seem to work properly. [Spark 1.0.0] A minimal example: case class P(name:String) val ps = Array(P(alice), P(bob), P(charly), P(bob)) sc.parallelize(ps).map(x= (x,1)).reduceByKey((x,y) = x+y).collect [Spark shell local mode] res : Array[(P, Int)] =

Re: Using case classes as keys does not seem to work.

2014-07-22 Thread Gerard Maas

,ArrayBuffer(1, 1))) On Tue, Jul 22, 2014 at 4:20 PM, Gerard Maas gerard.m...@gmail.com wrote: Using a case class as a key doesn't seem to work properly. [Spark 1.0.0] A minimal example: case class P(name:String) val ps = Array(P(alice), P(bob), P(charly), P(bob)) sc.parallelize(ps).map(x= (x,1

Re: Using case classes as keys does not seem to work.

2014-07-22 Thread Gerard Maas

, 2014 at 5:37 PM, Gerard Maas gerard.m...@gmail.com wrote: Yes, right. 'sc.parallelize(ps).map(x= (**x.name**,1)).groupByKey(). collect' An oversight from my side. Thanks!, Gerard. On Tue, Jul 22, 2014 at 5:24 PM, Daniel Siegmann daniel.siegm...@velos.io wrote: I can confirm this bug

Re: Should SPARK_HOME be needed with Mesos?

2014-05-22 Thread Gerard Maas

send in a pull request that includes your proposed changes? Andrew On Wed, May 21, 2014 at 10:19 AM, Gerard Maas gerard.m...@gmail.com wrote: Spark dev's, I was looking into a question asked on the user list where a ClassNotFoundException was thrown when running a job on Mesos

Re: Should SPARK_HOME be needed with Mesos?

2014-05-22 Thread Gerard Maas

a new ticket for just this particular issue. On Thu, May 22, 2014 at 11:03 AM, Gerard Maas gerard.m...@gmail.comwrote: Sure. Should I create a Jira as well? I saw there's already a broader ticket regarding the ambiguous use of SPARK_HOME [1] (cc: Patrick as owner of that ticket) I don't

Re: ClassNotFoundException with Spark/Mesos (spark-shell works fine)

2014-05-21 Thread Gerard Maas

Hi Tobias, I was curious about this issue and tried to run your example on my local Mesos. I was able to reproduce your issue using your current config: [error] (run-main-0) org.apache.spark.SparkException: Job aborted: Task 1.0:4 failed 4 times (most recent failure: Exception failure:

Should SPARK_HOME be needed with Mesos?

2014-05-21 Thread Gerard Maas

Spark dev's, I was looking into a question asked on the user list where a ClassNotFoundException was thrown when running a job on Mesos. Curious issue with serialization on Mesos: more details here [1]: When trying to run that simple example on my Mesos installation, I faced another issue: I got

Re: ClassNotFoundException with Spark/Mesos (spark-shell works fine)

2014-05-21 Thread Gerard Maas

for it to work. The SparkREPL works differently. It uses some dark magic to send the working session to the workers. -kr, Gerard. On Wed, May 21, 2014 at 2:47 PM, Gerard Maas gerard.m...@gmail.com wrote: Hi Tobias, I was curious about this issue and tried to run your example on my local

Re: Announcing the official Spark Job Server repo

2014-03-19 Thread Gerard Maas

this is cool +1 On Wed, Mar 19, 2014 at 6:54 PM, Patrick Wendell pwend...@gmail.com wrote: Evan - yep definitely open a JIRA. It would be nice to have a contrib repo set-up for the 1.0 release. On Tue, Mar 18, 2014 at 11:28 PM, Evan Chan e...@ooyala.com wrote: Matei, Maybe it's time

Re: [StructuredStreaming] HDFSBackedStateStoreProvider is leaking .crc files.

[StructuredStreaming] HDFSBackedStateStoreProvider is leaking .crc files.

[Structured Streaming] File source, Parquet format: use of the mergeSchema option.

[Structured Streaming] OOM on ConsoleSink with large inputs

Re: Handling questions in the mailing lists

Re: Can we remove private[spark] from Metrics Source and SInk traits?

Re: Time is ugly in Spark Streaming....

Re: Stages with non-arithmetic numbering Timing metrics in event logs

Re: [Streaming] Configure executor logging on Mesos

Re: Registering custom metrics

Re: Registering custom metrics

Tuning Spark Streaming jobs

Re: Tuning Spark Streaming jobs

Understanding reported times on the Spark UI [+ Streaming]

Re: Spark Streaming Metrics

Spark Streaming Metrics

Registering custom metrics

Using case classes as keys does not seem to work.

Re: Using case classes as keys does not seem to work.

Re: Using case classes as keys does not seem to work.

Re: Should SPARK_HOME be needed with Mesos?

Re: Should SPARK_HOME be needed with Mesos?

Re: ClassNotFoundException with Spark/Mesos (spark-shell works fine)

Should SPARK_HOME be needed with Mesos?

Re: ClassNotFoundException with Spark/Mesos (spark-shell works fine)

Re: Announcing the official Spark Job Server repo

26 matches

Site Navigation

Mail list logo

Footer information