Not sure whether this conversation is related to the concept of "transactor" in https://cwiki.apache.org/confluence/display/ASTERIXDB/Deadlock-Free+Locking+Protocol .
Best, Taewoo On Thu, Nov 16, 2017 at 3:41 PM, Xikui Wang <[email protected]> wrote: > How about we separate the ingestion part from the rest? We can create Job0 > for the ingestion which takes data from the datasource, and create Job1, > Job2, ... for the connections to dataset1, dataset2, dataset3 > respectively... We would need to pay the resource overhead still, but the > synchronization can be avoided. (I'm in the same camp with you, Abdullah. I > just want to pick up your brain to see how far this idea can go. :) ) > > If we want to keep multiple transactions in a single job and keep the > transaction id out of the job specification, we need to let the commit > runtime get the right transaction id from somewhere... Any good idea on > this? > > Best, > Xikui > > On Thu, Nov 16, 2017 at 3:10 PM, abdullah alamoudi <[email protected]> > wrote: > > > We are using multiple transactions in a single job in case of feed and I > > think that this is the correct way. > > Having a single job for a feed that feeds into multiple datasets is a > good > > thing since job resources/feed resources are consolidated. > > > > Here are some points: > > - We can't use the same transaction id to feed multiple datasets. The > only > > other option is to have multiple jobs each feeding a different dataset. > > - Having multiple jobs (in addition to the extra resources used, memory > > and CPU) would then forces us to either read data from external sources > > multiple times, parse records multiple times, etc > > or having to have a synchronization between the different jobs and the > > feed source within asterixdb. IMO, this is far more complicated than > having > > multiple transactions within a single job and the cost far outweigh the > > benefits. > > > > P.S, > > We are also using this for bucket connections in Couchbase Analytics. > > > > > On Nov 16, 2017, at 2:57 PM, Till Westmann <[email protected]> wrote: > > > > > > If there are a number of issue with supporting multiple transaction ids > > > and no clear benefits/use-cases, I’d vote for simplification :) > > > Also, code that’s not being used has a tendency to "rot" and so I think > > > that it’s usefulness might be limited by the time we’d find a use for > > > this functionality. > > > > > > My 2c, > > > Till > > > > > > On 16 Nov 2017, at 13:57, Xikui Wang wrote: > > > > > >> I'm separating the connections into different jobs in some of my > > >> experiments... but that was intended to be used for the experimental > > >> settings (i.e., not for master now)... > > >> > > >> I think the interesting question here is whether we want to allow one > > >> Hyracks job to carry multiple transactions. I personally think that > > should > > >> be allowed as the transaction and job are two separate concepts, but I > > >> couldn't find such use cases other than the feeds. Does anyone have a > > good > > >> example on this? > > >> > > >> Another question is, if we do allow multiple transactions in a single > > >> Hyracks job, how do we enable commit runtime to obtain the correct TXN > > id > > >> without having that embedded as part of the job specification. > > >> > > >> Best, > > >> Xikui > > >> > > >> On Thu, Nov 16, 2017 at 1:01 PM, abdullah alamoudi < > [email protected]> > > >> wrote: > > >> > > >>> I am curious as to how feed will work without this? > > >>> > > >>> ~Abdullah. > > >>>> On Nov 16, 2017, at 12:43 PM, Steven Jacobs <[email protected]> > wrote: > > >>>> > > >>>> Hi all, > > >>>> We currently have MultiTransactionJobletEventListenerFactory, which > > >>> allows > > >>>> for one Hyracks job to run multiple Asterix transactions together. > > >>>> > > >>>> This class is only used by feeds, and feeds are in process of > > changing to > > >>>> no longer need this feature. As part of the work in pre-deploying > job > > >>>> specifications to be used by multiple hyracks jobs, I've been > working > > on > > >>>> removing the transaction id from the job specifications, as we use a > > new > > >>>> transaction for each invocation of a deployed job. > > >>>> > > >>>> There is currently no clear way to remove the transaction id from > the > > job > > >>>> spec and keep the option for MultiTransactionJobletEventLis > > tenerFactory. > > >>>> > > >>>> The question for the group is, do we see a need to maintain this > class > > >>> that > > >>>> will no longer be used by any current code? Or, an other words, is > > there > > >>> a > > >>>> strong possibility that in the future we will want multiple > > transactions > > >>> to > > >>>> share a single Hyracks job, meaning that it is worth figuring out > how > > to > > >>>> maintain this class? > > >>>> > > >>>> Steven > > >>> > > >>> > > > > >
