Not sure whether this conversation is related to the concept of
"transactor" in
https://cwiki.apache.org/confluence/display/ASTERIXDB/Deadlock-Free+Locking+Protocol
.

Best,
Taewoo

On Thu, Nov 16, 2017 at 3:41 PM, Xikui Wang <[email protected]> wrote:

> How about we separate the ingestion part from the rest? We can create Job0
> for the ingestion which takes data from the datasource, and create Job1,
> Job2, ... for the connections to dataset1, dataset2, dataset3
> respectively... We would need to pay the resource overhead still, but the
> synchronization can be avoided. (I'm in the same camp with you, Abdullah. I
> just want to pick up your brain to see how far this idea can go. :) )
>
> If we want to keep multiple transactions in a single job and keep the
> transaction id out of the job specification, we need to let the commit
> runtime get the right transaction id from somewhere... Any good idea on
> this?
>
> Best,
> Xikui
>
> On Thu, Nov 16, 2017 at 3:10 PM, abdullah alamoudi <[email protected]>
> wrote:
>
> > We are using multiple transactions in a single job in case of feed and I
> > think that this is the correct way.
> > Having a single job for a feed that feeds into multiple datasets is a
> good
> > thing since job resources/feed resources are consolidated.
> >
> > Here are some points:
> > - We can't use the same transaction id to feed multiple datasets. The
> only
> > other option is to have multiple jobs each feeding a different dataset.
> > - Having multiple jobs (in addition to the extra resources used, memory
> > and CPU) would then forces us to either read data from external sources
> > multiple times, parse records multiple times, etc
> >   or having to have a synchronization between the different jobs and the
> > feed source within asterixdb. IMO, this is far more complicated than
> having
> > multiple transactions within a single job and the cost far outweigh the
> > benefits.
> >
> > P.S,
> > We are also using this for bucket connections in Couchbase Analytics.
> >
> > > On Nov 16, 2017, at 2:57 PM, Till Westmann <[email protected]> wrote:
> > >
> > > If there are a number of issue with supporting multiple transaction ids
> > > and no clear benefits/use-cases, I’d vote for simplification :)
> > > Also, code that’s not being used has a tendency to "rot" and so I think
> > > that it’s usefulness might be limited by the time we’d find a use for
> > > this functionality.
> > >
> > > My 2c,
> > > Till
> > >
> > > On 16 Nov 2017, at 13:57, Xikui Wang wrote:
> > >
> > >> I'm separating the connections into different jobs in some of my
> > >> experiments... but that was intended to be used for the experimental
> > >> settings (i.e., not for master now)...
> > >>
> > >> I think the interesting question here is whether we want to allow one
> > >> Hyracks job to carry multiple transactions. I personally think that
> > should
> > >> be allowed as the transaction and job are two separate concepts, but I
> > >> couldn't find such use cases other than the feeds. Does anyone have a
> > good
> > >> example on this?
> > >>
> > >> Another question is, if we do allow multiple transactions in a single
> > >> Hyracks job, how do we enable commit runtime to obtain the correct TXN
> > id
> > >> without having that embedded as part of the job specification.
> > >>
> > >> Best,
> > >> Xikui
> > >>
> > >> On Thu, Nov 16, 2017 at 1:01 PM, abdullah alamoudi <
> [email protected]>
> > >> wrote:
> > >>
> > >>> I am curious as to how feed will work without this?
> > >>>
> > >>> ~Abdullah.
> > >>>> On Nov 16, 2017, at 12:43 PM, Steven Jacobs <[email protected]>
> wrote:
> > >>>>
> > >>>> Hi all,
> > >>>> We currently have MultiTransactionJobletEventListenerFactory, which
> > >>> allows
> > >>>> for one Hyracks job to run multiple Asterix transactions together.
> > >>>>
> > >>>> This class is only used by feeds, and feeds are in process of
> > changing to
> > >>>> no longer need this feature. As part of the work in pre-deploying
> job
> > >>>> specifications to be used by multiple hyracks jobs, I've been
> working
> > on
> > >>>> removing the transaction id from the job specifications, as we use a
> > new
> > >>>> transaction for each invocation of a deployed job.
> > >>>>
> > >>>> There is currently no clear way to remove the transaction id from
> the
> > job
> > >>>> spec and keep the option for MultiTransactionJobletEventLis
> > tenerFactory.
> > >>>>
> > >>>> The question for the group is, do we see a need to maintain this
> class
> > >>> that
> > >>>> will no longer be used by any current code? Or, an other words, is
> > there
> > >>> a
> > >>>> strong possibility that in the future we will want multiple
> > transactions
> > >>> to
> > >>>> share a single Hyracks job, meaning that it is worth figuring out
> how
> > to
> > >>>> maintain this class?
> > >>>>
> > >>>> Steven
> > >>>
> > >>>
> >
> >
>

Reply via email to