Re: Is memory-only no-disk Spark possible? [Marketing Mail]

2021-08-20 Thread Jack Kolokasis
Hello Jacek, On 20/8/21 2:49 μ.μ., Jacek Laskowski wrote: Hi, I've been exploring BlockManager and the stores for a while now and am tempted to say that a memory-only Spark setup would be possible (except shuffle blocks). Is this correct? Correct. What about shuffle blocks? Do they have

Re: About how to read spark source code with a good way [Marketing Mail]

2020-08-19 Thread Jack Kolokasis
into the spark source code. Thanks Regards Joyan On Wed, Aug 19, 2020 at 11:06 AM Jack Kolokasis mailto:koloka...@ics.forth.gr>> wrote: Hi,  From my experience, I suggest to read both blogs and source code. Blogs will give you the high-level knowledge for the different

Re: About how to read spark source code with a good way [Marketing Mail]

2020-08-18 Thread Jack Kolokasis
Hi, From my experience, I suggest to read both blogs and source code. Blogs will give you the high-level knowledge for the different parts of the source code. Iacovos On 18/8/20 3:53 ??.??., 2400 wrote: Hi everyone ?? I am an engineer, I have been using spark, and I want to try to make

Re: Exact meaning of spark.memory.storageFraction in spark 2.3.x [Marketing Mail] [Marketing Mail]

2020-03-20 Thread Jack Kolokasis
avast.com/sig-email?utm_medium=email_source=link_campaign=sig-email_content=webmail> Le ven. 20 mars 2020 à 14:45, Jack Kolokasis <mailto:koloka...@ics.forth.gr>> a écrit : Hello Michel, Spark seperates executors memory using an adaptive boundary between

Re: Exact meaning of spark.memory.storageFraction in spark 2.3.x [Marketing Mail]

2020-03-20 Thread Jack Kolokasis
Hello Michel, Spark seperates executors memory using an adaptive boundary between storage and execution memory. If there is no caching and execution memory needs more space, then it will use a portion of the storage memory. If your program does not use caching then you can reduce storage

Calculate Task Memory Usage

2019-10-11 Thread Jack Kolokasis
Hello to all, I am trying to calculate how much memory each task in Spark consumes. Is there any way to measure this ? Thanks, Iacovos - To unsubscribe e-mail: user-unsubscr...@spark.apache.org

Shuffle Spill to Disk

2019-09-28 Thread Jack Kolokasis
Hello, I am trying to measure how many bytes spill to disk in shuffle operation and I get always zero. This is not correct because the spark local disk is utilized. Can anyone explain me why the spill counter is zero? Thanks, Iacovos

Spark and Java10

2019-07-06 Thread Jack Kolokasis
Hello,     I try to use Apache Spark v.2.3.1 using JAVA 10 but i can not. Spark documentation refers that Spark works using Java8+ . So, has anyone tried to use Apache Spark with Java 10 ? Thanks for your help, Iacovos - To

Re: installation of spark

2019-06-04 Thread Jack Kolokasis
Hello,     at first you will need to make sure that JAVA is installed, or install it otherwise. Then install scala and a build tool (sbt or maven). In my point of view, IntelliJ IDEA is a good option to create your Spark applications.  At the end you have to install a distributed file system

Re: Difference between Checkpointing and Persist

2019-04-18 Thread Jack Kolokasis
Hi,     in my point of view a good approach is first persist your data in StorageLevel.Memory_And_Disk and then perform join. This will accelerate your computation because data will be presented in memory and in your local intermediate storage device. --Iacovos On 4/18/19 8:49 PM, Subash

Load Time from HDFS

2019-04-02 Thread Jack Kolokasis
Hello,     I want to ask if there any way to measure HDFS data loading time at the start of my program. I tried to add an action e.g count() after val data = sc.textFile() call. But I notice that my program takes more time to finish than before adding count call. Is there any other way to do

Re: Spark Profiler

2019-03-27 Thread Jack Kolokasis
and event listeners are quite useful for that. See also https://github.com/apache/spark/blob/master/docs/monitoring.md and https://github.com/LucaCanali/sparkMeasure Regards, Luca *From:*manish ranjan *Sent:* Tuesday, March 26, 2019 15:24 *To:* Jack Kolokasis *Cc:* user *Subject:* Re: Spark

Spark Profiler

2019-03-26 Thread Jack Kolokasis
Hello all,     I am looking for a spark profiler to trace my application to find the bottlenecks. I need to trace CPU usage, Memory Usage and I/O usage. I am looking forward for your reply. --Iacovos - To unsubscribe

Measure Serialization / De-serialization Time

2018-11-15 Thread Jack Kolokasis
Hello all,     I am running a simple Word Count application using storage level MEMORY_ONLY in one case and OFF_HEAP on the other. I see that the execution time while I ran my application off-heap is higher than on-heap. So, I am looking where this time goes. One thought at first is that

StorageLevel: OffHeap

2018-11-08 Thread Jack Kolokasis
Hello everyone,     I am running a simple word count in Spark and I persist my RDDs using StorageLevel.OFF_HEAP. While I am running the application, i see through the Spark Web UI that are persisted in Disk.  Why this happen?? Can anyone tell me how off heap storage Level work ?? Thanks for