Yeah, perhaps I misunderstood what Michael was saying. But thanks for pointing out the relevant UI functionality.
On Sat, Nov 8, 2014 at 1:22 AM, Kay Ousterhout <k...@eecs.berkeley.edu> wrote: > I don't have much more info than what Shivaram said. My sense is that, > over time, task launch overhead with Spark has slowly grown as Spark > supports more and more functionality. However, I haven't seen it be as > high as the 100ms Michael quoted (maybe this was for jobs with tasks that > have much larger objects that take a long time to deserialize?). > Fortunately, the UI now quantifies this: if you click "Show Additional > Metrics", the scheduler delay (which basically represents the overhead of > shipping the task to the worker and getting the result back), the task > deserialization time, and the result serialization time all represent parts > of the task launch overhead. So, you can use the UI to get a sense of what > this overhead is for the workload you're considering and whether it's worth > optimizing. > > -Kay > > On Fri, Nov 7, 2014 at 9:43 PM, Shivaram Venkataraman < > shiva...@eecs.berkeley.edu> wrote: > >> I think Kay might be able to give a better answer. The most recent >> benchmark I remember had the number at at somewhere between 8.6ms and >> 14.6ms depending on the Spark version ( >> https://github.com/apache/spark/pull/2030#issuecomment-52715181). >> Another point to note is that this is the total time to run a null job, so >> this includes scheduling + task launch + time to send back results etc. >> >> Shivaram >> >> On Fri, Nov 7, 2014 at 9:23 PM, Nicholas Chammas < >> nicholas.cham...@gmail.com> wrote: >> >>> Hmm, relevant quote from section 3.3: >>> >>> newer frameworks like Spark [35] reduce the overhead to 5ms. To support >>>> tasks that complete in hundreds of mil- liseconds, we argue for reducing >>>> task launch overhead even further to 1ms so that launch overhead >>>> constitutes at most 1% of task runtime. By maintaining an active thread >>>> pool for task execution on each worker node and caching binaries, task >>>> launch overhead can be reduced to the time to make a remote procedure call >>>> to the slave machine to launch the task. Today’s datacenter networks easily >>>> allow a RPC to complete within 1ms. In fact, re- cent work showed that 10μs >>>> RPCs are possible in the short term [26]; thus, with careful engineering, >>>> we be- lieve task launch overheads of 50μ s are attainable. 50μ s task >>>> launch overheads would enable even smaller tasks that could read data from >>>> in-memory or from flash stor- age in order to complete in milliseconds. >>> >>> >>> So it looks like I misunderstood the current cost of task >>> initialization. It's already as low as 5ms (and not 100ms)? >>> >>> Nick >>> >>> On Fri, Nov 7, 2014 at 11:15 PM, Shivaram Venkataraman < >>> shiva...@eecs.berkeley.edu> wrote: >>> >>>> >>>> >>>> On Fri, Nov 7, 2014 at 8:04 PM, Nicholas Chammas < >>>> nicholas.cham...@gmail.com> wrote: >>>> >>>>> Sounds good. I'm looking forward to tracking improvements in this area. >>>>> >>>>> Also, just to connect some more dots here, I just remembered that >>>>> there is >>>>> currently an initiative to add an IndexedRDD >>>>> <https://issues.apache.org/jira/browse/SPARK-2365> interface. Some >>>>> interesting use cases mentioned there include (emphasis added): >>>>> >>>>> To address these problems, we propose IndexedRDD, an efficient >>>>> key-value >>>>> > store built on RDDs. IndexedRDD would extend RDD[(Long, V)] by >>>>> enforcing >>>>> > key uniqueness and pre-indexing the entries for efficient joins and >>>>> *point >>>>> > lookups, updates, and deletions*. >>>>> >>>>> >>>>> GraphX would be the first user of IndexedRDD, since it currently >>>>> implements >>>>> > a limited form of this functionality in VertexRDD. We envision a >>>>> variety of >>>>> > other uses for IndexedRDD, including *streaming updates* to RDDs, >>>>> *direct >>>>> > serving* from RDDs, and as an execution strategy for Spark SQL. >>>>> >>>>> >>>>> Maybe some day we'll have Spark clusters directly serving up point >>>>> lookups >>>>> or updates. I imagine the tasks running on clusters like that would be >>>>> tiny >>>>> and would benefit from very low task startup times and scheduling >>>>> latency. >>>>> Am I painting that picture correctly? >>>>> >>>>> Yeah - we painted a similar picture in a short paper last year titled >>>> "The Case for Tiny Tasks in Compute Clusters" >>>> http://shivaram.org/publications/tinytasks-hotos13.pdf >>>> >>>>> Anyway, thanks for explaining the current status of Sparrow. >>>>> >>>>> Nick >>>>> >>>> >>>> >>> >> >