How about we separate the ingestion part from the rest? We can create Job0 for the ingestion which takes data from the datasource, and create Job1, Job2, ... for the connections to dataset1, dataset2, dataset3 respectively... We would need to pay the resource overhead still, but the synchronization can be avoided. (I'm in the same camp with you, Abdullah. I just want to pick up your brain to see how far this idea can go. :) )
If we want to keep multiple transactions in a single job and keep the transaction id out of the job specification, we need to let the commit runtime get the right transaction id from somewhere... Any good idea on this? Best, Xikui On Thu, Nov 16, 2017 at 3:10 PM, abdullah alamoudi <[email protected]> wrote: > We are using multiple transactions in a single job in case of feed and I > think that this is the correct way. > Having a single job for a feed that feeds into multiple datasets is a good > thing since job resources/feed resources are consolidated. > > Here are some points: > - We can't use the same transaction id to feed multiple datasets. The only > other option is to have multiple jobs each feeding a different dataset. > - Having multiple jobs (in addition to the extra resources used, memory > and CPU) would then forces us to either read data from external sources > multiple times, parse records multiple times, etc > or having to have a synchronization between the different jobs and the > feed source within asterixdb. IMO, this is far more complicated than having > multiple transactions within a single job and the cost far outweigh the > benefits. > > P.S, > We are also using this for bucket connections in Couchbase Analytics. > > > On Nov 16, 2017, at 2:57 PM, Till Westmann <[email protected]> wrote: > > > > If there are a number of issue with supporting multiple transaction ids > > and no clear benefits/use-cases, I’d vote for simplification :) > > Also, code that’s not being used has a tendency to "rot" and so I think > > that it’s usefulness might be limited by the time we’d find a use for > > this functionality. > > > > My 2c, > > Till > > > > On 16 Nov 2017, at 13:57, Xikui Wang wrote: > > > >> I'm separating the connections into different jobs in some of my > >> experiments... but that was intended to be used for the experimental > >> settings (i.e., not for master now)... > >> > >> I think the interesting question here is whether we want to allow one > >> Hyracks job to carry multiple transactions. I personally think that > should > >> be allowed as the transaction and job are two separate concepts, but I > >> couldn't find such use cases other than the feeds. Does anyone have a > good > >> example on this? > >> > >> Another question is, if we do allow multiple transactions in a single > >> Hyracks job, how do we enable commit runtime to obtain the correct TXN > id > >> without having that embedded as part of the job specification. > >> > >> Best, > >> Xikui > >> > >> On Thu, Nov 16, 2017 at 1:01 PM, abdullah alamoudi <[email protected]> > >> wrote: > >> > >>> I am curious as to how feed will work without this? > >>> > >>> ~Abdullah. > >>>> On Nov 16, 2017, at 12:43 PM, Steven Jacobs <[email protected]> wrote: > >>>> > >>>> Hi all, > >>>> We currently have MultiTransactionJobletEventListenerFactory, which > >>> allows > >>>> for one Hyracks job to run multiple Asterix transactions together. > >>>> > >>>> This class is only used by feeds, and feeds are in process of > changing to > >>>> no longer need this feature. As part of the work in pre-deploying job > >>>> specifications to be used by multiple hyracks jobs, I've been working > on > >>>> removing the transaction id from the job specifications, as we use a > new > >>>> transaction for each invocation of a deployed job. > >>>> > >>>> There is currently no clear way to remove the transaction id from the > job > >>>> spec and keep the option for MultiTransactionJobletEventLis > tenerFactory. > >>>> > >>>> The question for the group is, do we see a need to maintain this class > >>> that > >>>> will no longer be used by any current code? Or, an other words, is > there > >>> a > >>>> strong possibility that in the future we will want multiple > transactions > >>> to > >>>> share a single Hyracks job, meaning that it is worth figuring out how > to > >>>> maintain this class? > >>>> > >>>> Steven > >>> > >>> > >
