Re: MultiTransactionJobletEventListenerFactory

abdullah alamoudi Fri, 17 Nov 2017 09:36:54 -0800

Also, was wondering how would you do the same for a single dataset
(non-feed). How would you get the transaction id and change it when you
re-run?


On Nov 17, 2017 7:12 AM, "Murtadha Hubail" <[email protected]> wrote:

> For atomic transactions, the change was merged yesterday. For entity level
> transactions, it should be a very small change.
>
> Cheers,
> Murtadha
>
> > On Nov 17, 2017, at 6:07 PM, abdullah alamoudi <[email protected]>
> wrote:
> >
> > I understand that is not the case right now but what you're working on?
> >
> > Cheers,
> > Abdullah.
> >
> >
> >> On Nov 17, 2017, at 7:04 AM, Murtadha Hubail <[email protected]>
> wrote:
> >>
> >> A transaction context can register multiple primary indexes.
> >> Since each entity commit log contains the dataset id, you can decrement
> the active operations on
> >> the operation tracker associated with that dataset id.
> >>
> >> On 17/11/2017, 5:52 PM, "abdullah alamoudi" <[email protected]> wrote:
> >>
> >>   Can you illustrate how a deadlock can happen? I am anxious to know.
> >>   Moreover, the reason for the multiple transaction ids in feeds is not
> simply because we compile them differently.
> >>
> >>   How would a commit operator know which dataset active operation
> counter to decrement if they share the same id for example?
> >>
> >>> On Nov 16, 2017, at 9:46 PM, Xikui Wang <[email protected]> wrote:
> >>>
> >>> Yes. That deadlock could happen. Currently, we have one-to-one
> mappings for
> >>> the jobs and transactions, except for the feeds.
> >>>
> >>> @Abdullah, after some digging into the code, I think probably we can
> use a
> >>> single transaction id for the job which feeds multiple datasets? See
> if I
> >>> can convince you. :)
> >>>
> >>> The reason we have multiple transaction ids in feeds is that we compile
> >>> each connection job separately and combine them into a single feed
> job. A
> >>> new transaction id is created and assigned to each connection job,
> thus for
> >>> the combined job, we have to handle the different transactions as they
> >>> are embedded in the connection job specifications. But, what if we
> create a
> >>> single transaction id for the combined job? That transaction id will be
> >>> embedded into each connection so they can write logs freely, but the
> >>> transaction will be started and committed only once as there is only
> one
> >>> feed job. In this way, we won't need multiTransactionJobletEventLis
> tener
> >>> and the transaction id can be removed from the job specification
> easily as
> >>> well (for Steven's change).
> >>>
> >>> Best,
> >>> Xikui
> >>>
> >>>
> >>>> On Thu, Nov 16, 2017 at 4:26 PM, Mike Carey <[email protected]>
> wrote:
> >>>>
> >>>> I worry about deadlocks.  The waits for graph may not understand that
> >>>> making t1 wait will also make t2 wait since they may share a thread -
> >>>> right?  Or do we have jobs and transactions separately represented
> there
> >>>> now?
> >>>>
> >>>>> On Nov 16, 2017 3:10 PM, "abdullah alamoudi" <[email protected]>
> wrote:
> >>>>>
> >>>>> We are using multiple transactions in a single job in case of feed
> and I
> >>>>> think that this is the correct way.
> >>>>> Having a single job for a feed that feeds into multiple datasets is a
> >>>> good
> >>>>> thing since job resources/feed resources are consolidated.
> >>>>>
> >>>>> Here are some points:
> >>>>> - We can't use the same transaction id to feed multiple datasets. The
> >>>> only
> >>>>> other option is to have multiple jobs each feeding a different
> dataset.
> >>>>> - Having multiple jobs (in addition to the extra resources used,
> memory
> >>>>> and CPU) would then forces us to either read data from external
> sources
> >>>>> multiple times, parse records multiple times, etc
> >>>>> or having to have a synchronization between the different jobs and
> the
> >>>>> feed source within asterixdb. IMO, this is far more complicated than
> >>>> having
> >>>>> multiple transactions within a single job and the cost far outweigh
> the
> >>>>> benefits.
> >>>>>
> >>>>> P.S,
> >>>>> We are also using this for bucket connections in Couchbase Analytics.
> >>>>>
> >>>>>> On Nov 16, 2017, at 2:57 PM, Till Westmann <[email protected]>
> wrote:
> >>>>>>
> >>>>>> If there are a number of issue with supporting multiple transaction
> ids
> >>>>>> and no clear benefits/use-cases, I’d vote for simplification :)
> >>>>>> Also, code that’s not being used has a tendency to "rot" and so I
> think
> >>>>>> that it’s usefulness might be limited by the time we’d find a use
> for
> >>>>>> this functionality.
> >>>>>>
> >>>>>> My 2c,
> >>>>>> Till
> >>>>>>
> >>>>>>> On 16 Nov 2017, at 13:57, Xikui Wang wrote:
> >>>>>>>
> >>>>>>> I'm separating the connections into different jobs in some of my
> >>>>>>> experiments... but that was intended to be used for the
> experimental
> >>>>>>> settings (i.e., not for master now)...
> >>>>>>>
> >>>>>>> I think the interesting question here is whether we want to allow
> one
> >>>>>>> Hyracks job to carry multiple transactions. I personally think that
> >>>>> should
> >>>>>>> be allowed as the transaction and job are two separate concepts,
> but I
> >>>>>>> couldn't find such use cases other than the feeds. Does anyone
> have a
> >>>>> good
> >>>>>>> example on this?
> >>>>>>>
> >>>>>>> Another question is, if we do allow multiple transactions in a
> single
> >>>>>>> Hyracks job, how do we enable commit runtime to obtain the correct
> TXN
> >>>>> id
> >>>>>>> without having that embedded as part of the job specification.
> >>>>>>>
> >>>>>>> Best,
> >>>>>>> Xikui
> >>>>>>>
> >>>>>>> On Thu, Nov 16, 2017 at 1:01 PM, abdullah alamoudi <
> >>>> [email protected]>
> >>>>>>> wrote:
> >>>>>>>
> >>>>>>>> I am curious as to how feed will work without this?
> >>>>>>>>
> >>>>>>>> ~Abdullah.
> >>>>>>>>> On Nov 16, 2017, at 12:43 PM, Steven Jacobs <[email protected]>
> >>>> wrote:
> >>>>>>>>>
> >>>>>>>>> Hi all,
> >>>>>>>>> We currently have MultiTransactionJobletEventListenerFactory,
> which
> >>>>>>>> allows
> >>>>>>>>> for one Hyracks job to run multiple Asterix transactions
> together.
> >>>>>>>>>
> >>>>>>>>> This class is only used by feeds, and feeds are in process of
> >>>>> changing to
> >>>>>>>>> no longer need this feature. As part of the work in pre-deploying
> >>>> job
> >>>>>>>>> specifications to be used by multiple hyracks jobs, I've been
> >>>> working
> >>>>> on
> >>>>>>>>> removing the transaction id from the job specifications, as we
> use a
> >>>>> new
> >>>>>>>>> transaction for each invocation of a deployed job.
> >>>>>>>>>
> >>>>>>>>> There is currently no clear way to remove the transaction id from
> >>>> the
> >>>>> job
> >>>>>>>>> spec and keep the option for MultiTransactionJobletEventLis
> >>>>> tenerFactory.
> >>>>>>>>>
> >>>>>>>>> The question for the group is, do we see a need to maintain this
> >>>> class
> >>>>>>>> that
> >>>>>>>>> will no longer be used by any current code? Or, an other words,
> is
> >>>>> there
> >>>>>>>> a
> >>>>>>>>> strong possibility that in the future we will want multiple
> >>>>> transactions
> >>>>>>>> to
> >>>>>>>>> share a single Hyracks job, meaning that it is worth figuring out
> >>>> how
> >>>>> to
> >>>>>>>>> maintain this class?
> >>>>>>>>>
> >>>>>>>>> Steven
> >>>>>>>>
> >>>>>>>>
> >>>>>
> >>>>>
> >>>>
> >>
> >>
> >>
> >>
> >
>

Re: MultiTransactionJobletEventListenerFactory

Reply via email to