Hi Xuannan,

If user submits Job 1 and generated a cached intermediate result. And later
on, user submitted job 2 which should ideally use the intermediate result.
In that case, if job 2 failed due to missing the intermediate result, Job 2
should be retried with its full DAG. After that when Job 2 runs, it will
also re-generate the cache. However, once job 2 has fell back to the
original DAG, should it just be treated as an ordinary job that follow the
recovery strategy? Having a separate configuration seems a little
confusing. In another word, re-generating the cache is just a byproduct of
running the full DAG of job 2, but is not the main purpose. It is just like
when job 1 runs to generate cache, it does not have a separate config of
retry to make sure the cache is generated. If it fails, it just fail like
an ordinary job.

What do you think?

Thanks,

Jiangjie (Becket) Qin

On Fri, Apr 24, 2020 at 5:00 PM Xuannan Su <suxuanna...@gmail.com> wrote:

> Hi Becket,
>
> The intermediate result will indeed be automatically re-generated by
> resubmitting the original DAG. And that job could fail as well. In that
> case, we need to decide if we should resubmit the original DAG to
> re-generate the intermediate result or give up and throw an exception to
> the user. And the config is to indicate how many resubmit should happen
> before giving up.
>
> Thanks,
> Xuannan
>
> On Fri, Apr 24, 2020 at 4:19 PM Becket Qin <becket....@gmail.com> wrote:
>
> > Hi Xuannan,
> >
> >  I am not entirely sure if I understand the cases you mentioned. The
> users
> > > can use the cached table object returned by the .cache() method in
> other
> > > job and it should read the intermediate result. The intermediate result
> > can
> > > gone in the following three cases: 1. the user explicitly call the
> > > invalidateCache() method 2. the TableEnvironment is closed 3. failure
> > > happens on the TM. When that happens, the intermeidate result will not
> be
> > > available unless it is re-generated.
> >
> >
> > What confused me was that why do we need to have a *cache.retries.max
> > *config?
> > Shouldn't the missing intermediate result always be automatically
> > re-generated if it is gone?
> >
> > Thanks,
> >
> > Jiangjie (Becket) Qin
> >
> >
> > On Fri, Apr 24, 2020 at 3:59 PM Xuannan Su <suxuanna...@gmail.com>
> wrote:
> >
> > > Hi Becket,
> > >
> > > Thanks for the comments.
> > >
> > > On Fri, Apr 24, 2020 at 9:12 AM Becket Qin <becket....@gmail.com>
> wrote:
> > >
> > > > Hi Xuannan,
> > > >
> > > > Thanks for picking up the FLIP. It looks good to me overall. Some
> quick
> > > > comments / questions below:
> > > >
> > > > 1. Do we also need changes in the Java API?
> > > >
> > >
> > > Yes, the public interface of Table and TableEnvironment should be made
> in
> > > the Java API.
> > >
> > >
> > > > 2. What are the cases that users may want to retry reading the
> > > intermediate
> > > > result? It seems that once the intermediate result has gone, it will
> > not
> > > be
> > > > available later without being generated again, right?
> > > >
> > >
> > >  I am not entirely sure if I understand the cases you mentioned. The
> > users
> > > can use the cached table object returned by the .cache() method in
> other
> > > job and it should read the intermediate result. The intermediate result
> > can
> > > gone in the following three cases: 1. the user explicitly call the
> > > invalidateCache() method 2. the TableEnvironment is closed 3. failure
> > > happens on the TM. When that happens, the intermeidate result will not
> be
> > > available unless it is re-generated.
> > >
> > > 3. In the "semantic of cache() method" section, the description "The
> > > > semantic of the *cache() *method is a little different depending on
> > > whether
> > > > auto caching is enabled or not." seems not explained.
> > > >
> > >
> > > This line is actually outdated and should be removed, as we are not
> > adding
> > > the auto caching functionality in this FLIP. Auto caching will be added
> > in
> > > the future, and the semantic of cache() when auto caching is enabled
> will
> > > be discussed in detail by a new FLIP. I will remove the descriptor to
> > avoid
> > > further confusion.
> > >
> > >
> > > > Thanks,
> > > >
> > > > Jiangjie (Becket) Qin
> > > >
> > > >
> > > >
> > > > On Wed, Apr 22, 2020 at 4:00 PM Xuannan Su <suxuanna...@gmail.com>
> > > wrote:
> > > >
> > > > > Hi folks,
> > > > >
> > > > > I'd like to start the discussion about FLIP-36 Support Interactive
> > > > > Programming in Flink Table API
> > > > >
> > > > >
> > > >
> > >
> >
> https://cwiki.apache.org/confluence/display/FLINK/FLIP-36%3A+Support+Interactive+Programming+in+Flink
> > > > >
> > > > > The FLIP proposes to add support for interactive programming in
> Flink
> > > > Table
> > > > > API. Specifically, it let users cache the intermediate
> > results(tables)
> > > > and
> > > > > use them in the later jobs.
> > > > >
> > > > > Even though the FLIP has been discussed in the past[1], the FLIP
> > hasn't
> > > > > formally passed the vote yet. And some of the design and
> > implementation
> > > > > detail have to change to incorporates the cluster partition
> proposed
> > in
> > > > > FLIP-67[2].
> > > > >
> > > > > Looking forward to your feedback.
> > > > >
> > > > > Thanks,
> > > > > Xuannan
> > > > >
> > > > > [1]
> > > > >
> > > > >
> > > >
> > >
> >
> https://cwiki.apache.org/confluence/display/FLINK/FLIP-67%3A+Cluster+partitions+lifecycle
> > > > > [2]
> > > > >
> > > > >
> > > >
> > >
> >
> https://lists.apache.org/thread.html/b372fd7b962b9f37e4dace3bc8828f6e2a2b855e56984e58bc4a413f@%3Cdev.flink.apache.org%3E
> > > > >
> > > >
> > >
> >
>

Reply via email to