Re: [jira] [Created] (SPARK-1855) Provide memory-and-local-disk RDD checkpointing

Andrew Ash Sun, 18 May 2014 12:20:03 -0700

The nice thing about putting discussion on the Jira is that everything
about the bug is in one place.  So people looking to understand the
discussion a few years from now only have to look on the jira ticket rather
than also search the mailing list archives and hope commenters all put the
string "SPARK-1855" into the messages.



On Sun, May 18, 2014 at 10:34 AM, Jacek Laskowski <[email protected]> wrote:

> Hi,
>
> I'm curious if it's a common approach to have discussions in JIRA not here.
> I don't think it's the ASF way.
>
> Pozdrawiam,
> Jacek Laskowski
> http://blog.japila.pl
> 17 maj 2014 23:55 "Matei Zaharia" <[email protected]> napisał(a):
>
> > We do actually have replicated StorageLevels in Spark. You can use
> > MEMORY_AND_DISK_2 or construct your own StorageLevel with your own custom
> > replication factor.
> >
> > BTW you guys should probably have this discussion on the JIRA rather than
> > the dev list; I think the replies somehow ended up on the dev list.
> >
> > Matei
> >
> > On May 17, 2014, at 1:36 AM, Mridul Muralidharan <[email protected]>
> wrote:
> >
> > > We don't have 3x replication in spark :-)
> > > And if we use replicated storagelevel, while decreasing odds of
> failure,
> > it
> > > does not eliminate it (since we are not doing a great job with
> > replication
> > > anyway from fault tolerance point of view).
> > > Also it does take a nontrivial performance hit with replicated levels.
> > >
> > > Regards,
> > > Mridul
> > > On 17-May-2014 8:16 am, "Xiangrui Meng" <[email protected]> wrote:
> > >
> > >> With 3x replication, we should be able to achieve fault tolerance.
> > >> This checkPointed RDD can be cleared if we have another in-memory
> > >> checkPointed RDD down the line. It can avoid hitting disk if we have
> > >> enough memory to use. We need to investigate more to find a good
> > >> solution. -Xiangrui
> > >>
> > >> On Fri, May 16, 2014 at 4:00 PM, Mridul Muralidharan <
> [email protected]>
> > >> wrote:
> > >>> Effectively this is persist without fault tolerance.
> > >>> Failure of any node means complete lack of fault tolerance.
> > >>> I would be very skeptical of truncating lineage if it is not
> reliable.
> > >>> On 17-May-2014 3:49 am, "Xiangrui Meng (JIRA)" <[email protected]>
> > wrote:
> > >>>
> > >>>> Xiangrui Meng created SPARK-1855:
> > >>>> ------------------------------------
> > >>>>
> > >>>>             Summary: Provide memory-and-local-disk RDD checkpointing
> > >>>>                 Key: SPARK-1855
> > >>>>                 URL:
> https://issues.apache.org/jira/browse/SPARK-1855
> > >>>>             Project: Spark
> > >>>>          Issue Type: New Feature
> > >>>>          Components: MLlib, Spark Core
> > >>>>    Affects Versions: 1.0.0
> > >>>>            Reporter: Xiangrui Meng
> > >>>>
> > >>>>
> > >>>> Checkpointing is used to cut long lineage while maintaining fault
> > >>>> tolerance. The current implementation is HDFS-based. Using the
> > BlockRDD
> > >> we
> > >>>> can create in-memory-and-local-disk (with replication) checkpoints
> > that
> > >> are
> > >>>> not as reliable as HDFS-based solution but faster.
> > >>>>
> > >>>> It can help applications that require many iterations.
> > >>>>
> > >>>>
> > >>>>
> > >>>> --
> > >>>> This message was sent by Atlassian JIRA
> > >>>> (v6.2#6252)
> > >>>>
> > >>
> >
> >
>

Re: [jira] [Created] (SPARK-1855) Provide memory-and-local-disk RDD checkpointing

Reply via email to