Re: MultiTransactionJobletEventListenerFactory

Steven Jacobs Fri, 17 Nov 2017 10:40:35 -0800

Maybe it would be good to have a meeting about this with all interested
parties?


I can be on-campus at UCI on Tuesday if that would be a good day to meet.

Steven

On Fri, Nov 17, 2017 at 9:36 AM, abdullah alamoudi <[email protected]>
wrote:

> Also, was wondering how would you do the same for a single dataset
> (non-feed). How would you get the transaction id and change it when you
> re-run?
>
> On Nov 17, 2017 7:12 AM, "Murtadha Hubail" <[email protected]> wrote:
>
> > For atomic transactions, the change was merged yesterday. For entity
> level
> > transactions, it should be a very small change.
> >
> > Cheers,
> > Murtadha
> >
> > > On Nov 17, 2017, at 6:07 PM, abdullah alamoudi <[email protected]>
> > wrote:
> > >
> > > I understand that is not the case right now but what you're working on?
> > >
> > > Cheers,
> > > Abdullah.
> > >
> > >
> > >> On Nov 17, 2017, at 7:04 AM, Murtadha Hubail <[email protected]>
> > wrote:
> > >>
> > >> A transaction context can register multiple primary indexes.
> > >> Since each entity commit log contains the dataset id, you can
> decrement
> > the active operations on
> > >> the operation tracker associated with that dataset id.
> > >>
> > >> On 17/11/2017, 5:52 PM, "abdullah alamoudi" <[email protected]>
> wrote:
> > >>
> > >>   Can you illustrate how a deadlock can happen? I am anxious to know.
> > >>   Moreover, the reason for the multiple transaction ids in feeds is
> not
> > simply because we compile them differently.
> > >>
> > >>   How would a commit operator know which dataset active operation
> > counter to decrement if they share the same id for example?
> > >>
> > >>> On Nov 16, 2017, at 9:46 PM, Xikui Wang <[email protected]> wrote:
> > >>>
> > >>> Yes. That deadlock could happen. Currently, we have one-to-one
> > mappings for
> > >>> the jobs and transactions, except for the feeds.
> > >>>
> > >>> @Abdullah, after some digging into the code, I think probably we can
> > use a
> > >>> single transaction id for the job which feeds multiple datasets? See
> > if I
> > >>> can convince you. :)
> > >>>
> > >>> The reason we have multiple transaction ids in feeds is that we
> compile
> > >>> each connection job separately and combine them into a single feed
> > job. A
> > >>> new transaction id is created and assigned to each connection job,
> > thus for
> > >>> the combined job, we have to handle the different transactions as
> they
> > >>> are embedded in the connection job specifications. But, what if we
> > create a
> > >>> single transaction id for the combined job? That transaction id will
> be
> > >>> embedded into each connection so they can write logs freely, but the
> > >>> transaction will be started and committed only once as there is only
> > one
> > >>> feed job. In this way, we won't need multiTransactionJobletEventLis
> > tener
> > >>> and the transaction id can be removed from the job specification
> > easily as
> > >>> well (for Steven's change).
> > >>>
> > >>> Best,
> > >>> Xikui
> > >>>
> > >>>
> > >>>> On Thu, Nov 16, 2017 at 4:26 PM, Mike Carey <[email protected]>
> > wrote:
> > >>>>
> > >>>> I worry about deadlocks.  The waits for graph may not understand
> that
> > >>>> making t1 wait will also make t2 wait since they may share a thread
> -
> > >>>> right?  Or do we have jobs and transactions separately represented
> > there
> > >>>> now?
> > >>>>
> > >>>>> On Nov 16, 2017 3:10 PM, "abdullah alamoudi" <[email protected]>
> > wrote:
> > >>>>>
> > >>>>> We are using multiple transactions in a single job in case of feed
> > and I
> > >>>>> think that this is the correct way.
> > >>>>> Having a single job for a feed that feeds into multiple datasets
> is a
> > >>>> good
> > >>>>> thing since job resources/feed resources are consolidated.
> > >>>>>
> > >>>>> Here are some points:
> > >>>>> - We can't use the same transaction id to feed multiple datasets.
> The
> > >>>> only
> > >>>>> other option is to have multiple jobs each feeding a different
> > dataset.
> > >>>>> - Having multiple jobs (in addition to the extra resources used,
> > memory
> > >>>>> and CPU) would then forces us to either read data from external
> > sources
> > >>>>> multiple times, parse records multiple times, etc
> > >>>>> or having to have a synchronization between the different jobs and
> > the
> > >>>>> feed source within asterixdb. IMO, this is far more complicated
> than
> > >>>> having
> > >>>>> multiple transactions within a single job and the cost far outweigh
> > the
> > >>>>> benefits.
> > >>>>>
> > >>>>> P.S,
> > >>>>> We are also using this for bucket connections in Couchbase
> Analytics.
> > >>>>>
> > >>>>>> On Nov 16, 2017, at 2:57 PM, Till Westmann <[email protected]>
> > wrote:
> > >>>>>>
> > >>>>>> If there are a number of issue with supporting multiple
> transaction
> > ids
> > >>>>>> and no clear benefits/use-cases, I’d vote for simplification :)
> > >>>>>> Also, code that’s not being used has a tendency to "rot" and so I
> > think
> > >>>>>> that it’s usefulness might be limited by the time we’d find a use
> > for
> > >>>>>> this functionality.
> > >>>>>>
> > >>>>>> My 2c,
> > >>>>>> Till
> > >>>>>>
> > >>>>>>> On 16 Nov 2017, at 13:57, Xikui Wang wrote:
> > >>>>>>>
> > >>>>>>> I'm separating the connections into different jobs in some of my
> > >>>>>>> experiments... but that was intended to be used for the
> > experimental
> > >>>>>>> settings (i.e., not for master now)...
> > >>>>>>>
> > >>>>>>> I think the interesting question here is whether we want to allow
> > one
> > >>>>>>> Hyracks job to carry multiple transactions. I personally think
> that
> > >>>>> should
> > >>>>>>> be allowed as the transaction and job are two separate concepts,
> > but I
> > >>>>>>> couldn't find such use cases other than the feeds. Does anyone
> > have a
> > >>>>> good
> > >>>>>>> example on this?
> > >>>>>>>
> > >>>>>>> Another question is, if we do allow multiple transactions in a
> > single
> > >>>>>>> Hyracks job, how do we enable commit runtime to obtain the
> correct
> > TXN
> > >>>>> id
> > >>>>>>> without having that embedded as part of the job specification.
> > >>>>>>>
> > >>>>>>> Best,
> > >>>>>>> Xikui
> > >>>>>>>
> > >>>>>>> On Thu, Nov 16, 2017 at 1:01 PM, abdullah alamoudi <
> > >>>> [email protected]>
> > >>>>>>> wrote:
> > >>>>>>>
> > >>>>>>>> I am curious as to how feed will work without this?
> > >>>>>>>>
> > >>>>>>>> ~Abdullah.
> > >>>>>>>>> On Nov 16, 2017, at 12:43 PM, Steven Jacobs <[email protected]>
> > >>>> wrote:
> > >>>>>>>>>
> > >>>>>>>>> Hi all,
> > >>>>>>>>> We currently have MultiTransactionJobletEventListenerFactory,
> > which
> > >>>>>>>> allows
> > >>>>>>>>> for one Hyracks job to run multiple Asterix transactions
> > together.
> > >>>>>>>>>
> > >>>>>>>>> This class is only used by feeds, and feeds are in process of
> > >>>>> changing to
> > >>>>>>>>> no longer need this feature. As part of the work in
> pre-deploying
> > >>>> job
> > >>>>>>>>> specifications to be used by multiple hyracks jobs, I've been
> > >>>> working
> > >>>>> on
> > >>>>>>>>> removing the transaction id from the job specifications, as we
> > use a
> > >>>>> new
> > >>>>>>>>> transaction for each invocation of a deployed job.
> > >>>>>>>>>
> > >>>>>>>>> There is currently no clear way to remove the transaction id
> from
> > >>>> the
> > >>>>> job
> > >>>>>>>>> spec and keep the option for MultiTransactionJobletEventLis
> > >>>>> tenerFactory.
> > >>>>>>>>>
> > >>>>>>>>> The question for the group is, do we see a need to maintain
> this
> > >>>> class
> > >>>>>>>> that
> > >>>>>>>>> will no longer be used by any current code? Or, an other words,
> > is
> > >>>>> there
> > >>>>>>>> a
> > >>>>>>>>> strong possibility that in the future we will want multiple
> > >>>>> transactions
> > >>>>>>>> to
> > >>>>>>>>> share a single Hyracks job, meaning that it is worth figuring
> out
> > >>>> how
> > >>>>> to
> > >>>>>>>>> maintain this class?
> > >>>>>>>>>
> > >>>>>>>>> Steven
> > >>>>>>>>
> > >>>>>>>>
> > >>>>>
> > >>>>>
> > >>>>
> > >>
> > >>
> > >>
> > >>
> > >
> >
>

Re: MultiTransactionJobletEventListenerFactory

Reply via email to