Also, was wondering how would you do the same for a single dataset (non-feed). How would you get the transaction id and change it when you re-run?
On Nov 17, 2017 7:12 AM, "Murtadha Hubail" <[email protected]> wrote: > For atomic transactions, the change was merged yesterday. For entity level > transactions, it should be a very small change. > > Cheers, > Murtadha > > > On Nov 17, 2017, at 6:07 PM, abdullah alamoudi <[email protected]> > wrote: > > > > I understand that is not the case right now but what you're working on? > > > > Cheers, > > Abdullah. > > > > > >> On Nov 17, 2017, at 7:04 AM, Murtadha Hubail <[email protected]> > wrote: > >> > >> A transaction context can register multiple primary indexes. > >> Since each entity commit log contains the dataset id, you can decrement > the active operations on > >> the operation tracker associated with that dataset id. > >> > >> On 17/11/2017, 5:52 PM, "abdullah alamoudi" <[email protected]> wrote: > >> > >> Can you illustrate how a deadlock can happen? I am anxious to know. > >> Moreover, the reason for the multiple transaction ids in feeds is not > simply because we compile them differently. > >> > >> How would a commit operator know which dataset active operation > counter to decrement if they share the same id for example? > >> > >>> On Nov 16, 2017, at 9:46 PM, Xikui Wang <[email protected]> wrote: > >>> > >>> Yes. That deadlock could happen. Currently, we have one-to-one > mappings for > >>> the jobs and transactions, except for the feeds. > >>> > >>> @Abdullah, after some digging into the code, I think probably we can > use a > >>> single transaction id for the job which feeds multiple datasets? See > if I > >>> can convince you. :) > >>> > >>> The reason we have multiple transaction ids in feeds is that we compile > >>> each connection job separately and combine them into a single feed > job. A > >>> new transaction id is created and assigned to each connection job, > thus for > >>> the combined job, we have to handle the different transactions as they > >>> are embedded in the connection job specifications. But, what if we > create a > >>> single transaction id for the combined job? That transaction id will be > >>> embedded into each connection so they can write logs freely, but the > >>> transaction will be started and committed only once as there is only > one > >>> feed job. In this way, we won't need multiTransactionJobletEventLis > tener > >>> and the transaction id can be removed from the job specification > easily as > >>> well (for Steven's change). > >>> > >>> Best, > >>> Xikui > >>> > >>> > >>>> On Thu, Nov 16, 2017 at 4:26 PM, Mike Carey <[email protected]> > wrote: > >>>> > >>>> I worry about deadlocks. The waits for graph may not understand that > >>>> making t1 wait will also make t2 wait since they may share a thread - > >>>> right? Or do we have jobs and transactions separately represented > there > >>>> now? > >>>> > >>>>> On Nov 16, 2017 3:10 PM, "abdullah alamoudi" <[email protected]> > wrote: > >>>>> > >>>>> We are using multiple transactions in a single job in case of feed > and I > >>>>> think that this is the correct way. > >>>>> Having a single job for a feed that feeds into multiple datasets is a > >>>> good > >>>>> thing since job resources/feed resources are consolidated. > >>>>> > >>>>> Here are some points: > >>>>> - We can't use the same transaction id to feed multiple datasets. The > >>>> only > >>>>> other option is to have multiple jobs each feeding a different > dataset. > >>>>> - Having multiple jobs (in addition to the extra resources used, > memory > >>>>> and CPU) would then forces us to either read data from external > sources > >>>>> multiple times, parse records multiple times, etc > >>>>> or having to have a synchronization between the different jobs and > the > >>>>> feed source within asterixdb. IMO, this is far more complicated than > >>>> having > >>>>> multiple transactions within a single job and the cost far outweigh > the > >>>>> benefits. > >>>>> > >>>>> P.S, > >>>>> We are also using this for bucket connections in Couchbase Analytics. > >>>>> > >>>>>> On Nov 16, 2017, at 2:57 PM, Till Westmann <[email protected]> > wrote: > >>>>>> > >>>>>> If there are a number of issue with supporting multiple transaction > ids > >>>>>> and no clear benefits/use-cases, I’d vote for simplification :) > >>>>>> Also, code that’s not being used has a tendency to "rot" and so I > think > >>>>>> that it’s usefulness might be limited by the time we’d find a use > for > >>>>>> this functionality. > >>>>>> > >>>>>> My 2c, > >>>>>> Till > >>>>>> > >>>>>>> On 16 Nov 2017, at 13:57, Xikui Wang wrote: > >>>>>>> > >>>>>>> I'm separating the connections into different jobs in some of my > >>>>>>> experiments... but that was intended to be used for the > experimental > >>>>>>> settings (i.e., not for master now)... > >>>>>>> > >>>>>>> I think the interesting question here is whether we want to allow > one > >>>>>>> Hyracks job to carry multiple transactions. I personally think that > >>>>> should > >>>>>>> be allowed as the transaction and job are two separate concepts, > but I > >>>>>>> couldn't find such use cases other than the feeds. Does anyone > have a > >>>>> good > >>>>>>> example on this? > >>>>>>> > >>>>>>> Another question is, if we do allow multiple transactions in a > single > >>>>>>> Hyracks job, how do we enable commit runtime to obtain the correct > TXN > >>>>> id > >>>>>>> without having that embedded as part of the job specification. > >>>>>>> > >>>>>>> Best, > >>>>>>> Xikui > >>>>>>> > >>>>>>> On Thu, Nov 16, 2017 at 1:01 PM, abdullah alamoudi < > >>>> [email protected]> > >>>>>>> wrote: > >>>>>>> > >>>>>>>> I am curious as to how feed will work without this? > >>>>>>>> > >>>>>>>> ~Abdullah. > >>>>>>>>> On Nov 16, 2017, at 12:43 PM, Steven Jacobs <[email protected]> > >>>> wrote: > >>>>>>>>> > >>>>>>>>> Hi all, > >>>>>>>>> We currently have MultiTransactionJobletEventListenerFactory, > which > >>>>>>>> allows > >>>>>>>>> for one Hyracks job to run multiple Asterix transactions > together. > >>>>>>>>> > >>>>>>>>> This class is only used by feeds, and feeds are in process of > >>>>> changing to > >>>>>>>>> no longer need this feature. As part of the work in pre-deploying > >>>> job > >>>>>>>>> specifications to be used by multiple hyracks jobs, I've been > >>>> working > >>>>> on > >>>>>>>>> removing the transaction id from the job specifications, as we > use a > >>>>> new > >>>>>>>>> transaction for each invocation of a deployed job. > >>>>>>>>> > >>>>>>>>> There is currently no clear way to remove the transaction id from > >>>> the > >>>>> job > >>>>>>>>> spec and keep the option for MultiTransactionJobletEventLis > >>>>> tenerFactory. > >>>>>>>>> > >>>>>>>>> The question for the group is, do we see a need to maintain this > >>>> class > >>>>>>>> that > >>>>>>>>> will no longer be used by any current code? Or, an other words, > is > >>>>> there > >>>>>>>> a > >>>>>>>>> strong possibility that in the future we will want multiple > >>>>> transactions > >>>>>>>> to > >>>>>>>>> share a single Hyracks job, meaning that it is worth figuring out > >>>> how > >>>>> to > >>>>>>>>> maintain this class? > >>>>>>>>> > >>>>>>>>> Steven > >>>>>>>> > >>>>>>>> > >>>>> > >>>>> > >>>> > >> > >> > >> > >> > > >
