Can you illustrate how a deadlock can happen? I am anxious to know. Moreover, the reason for the multiple transaction ids in feeds is not simply because we compile them differently.
How would a commit operator know which dataset active operation counter to decrement if they share the same id for example? > On Nov 16, 2017, at 9:46 PM, Xikui Wang <[email protected]> wrote: > > Yes. That deadlock could happen. Currently, we have one-to-one mappings for > the jobs and transactions, except for the feeds. > > @Abdullah, after some digging into the code, I think probably we can use a > single transaction id for the job which feeds multiple datasets? See if I > can convince you. :) > > The reason we have multiple transaction ids in feeds is that we compile > each connection job separately and combine them into a single feed job. A > new transaction id is created and assigned to each connection job, thus for > the combined job, we have to handle the different transactions as they > are embedded in the connection job specifications. But, what if we create a > single transaction id for the combined job? That transaction id will be > embedded into each connection so they can write logs freely, but the > transaction will be started and committed only once as there is only one > feed job. In this way, we won't need multiTransactionJobletEventListener > and the transaction id can be removed from the job specification easily as > well (for Steven's change). > > Best, > Xikui > > > On Thu, Nov 16, 2017 at 4:26 PM, Mike Carey <[email protected]> wrote: > >> I worry about deadlocks. The waits for graph may not understand that >> making t1 wait will also make t2 wait since they may share a thread - >> right? Or do we have jobs and transactions separately represented there >> now? >> >> On Nov 16, 2017 3:10 PM, "abdullah alamoudi" <[email protected]> wrote: >> >>> We are using multiple transactions in a single job in case of feed and I >>> think that this is the correct way. >>> Having a single job for a feed that feeds into multiple datasets is a >> good >>> thing since job resources/feed resources are consolidated. >>> >>> Here are some points: >>> - We can't use the same transaction id to feed multiple datasets. The >> only >>> other option is to have multiple jobs each feeding a different dataset. >>> - Having multiple jobs (in addition to the extra resources used, memory >>> and CPU) would then forces us to either read data from external sources >>> multiple times, parse records multiple times, etc >>> or having to have a synchronization between the different jobs and the >>> feed source within asterixdb. IMO, this is far more complicated than >> having >>> multiple transactions within a single job and the cost far outweigh the >>> benefits. >>> >>> P.S, >>> We are also using this for bucket connections in Couchbase Analytics. >>> >>>> On Nov 16, 2017, at 2:57 PM, Till Westmann <[email protected]> wrote: >>>> >>>> If there are a number of issue with supporting multiple transaction ids >>>> and no clear benefits/use-cases, I’d vote for simplification :) >>>> Also, code that’s not being used has a tendency to "rot" and so I think >>>> that it’s usefulness might be limited by the time we’d find a use for >>>> this functionality. >>>> >>>> My 2c, >>>> Till >>>> >>>> On 16 Nov 2017, at 13:57, Xikui Wang wrote: >>>> >>>>> I'm separating the connections into different jobs in some of my >>>>> experiments... but that was intended to be used for the experimental >>>>> settings (i.e., not for master now)... >>>>> >>>>> I think the interesting question here is whether we want to allow one >>>>> Hyracks job to carry multiple transactions. I personally think that >>> should >>>>> be allowed as the transaction and job are two separate concepts, but I >>>>> couldn't find such use cases other than the feeds. Does anyone have a >>> good >>>>> example on this? >>>>> >>>>> Another question is, if we do allow multiple transactions in a single >>>>> Hyracks job, how do we enable commit runtime to obtain the correct TXN >>> id >>>>> without having that embedded as part of the job specification. >>>>> >>>>> Best, >>>>> Xikui >>>>> >>>>> On Thu, Nov 16, 2017 at 1:01 PM, abdullah alamoudi < >> [email protected]> >>>>> wrote: >>>>> >>>>>> I am curious as to how feed will work without this? >>>>>> >>>>>> ~Abdullah. >>>>>>> On Nov 16, 2017, at 12:43 PM, Steven Jacobs <[email protected]> >> wrote: >>>>>>> >>>>>>> Hi all, >>>>>>> We currently have MultiTransactionJobletEventListenerFactory, which >>>>>> allows >>>>>>> for one Hyracks job to run multiple Asterix transactions together. >>>>>>> >>>>>>> This class is only used by feeds, and feeds are in process of >>> changing to >>>>>>> no longer need this feature. As part of the work in pre-deploying >> job >>>>>>> specifications to be used by multiple hyracks jobs, I've been >> working >>> on >>>>>>> removing the transaction id from the job specifications, as we use a >>> new >>>>>>> transaction for each invocation of a deployed job. >>>>>>> >>>>>>> There is currently no clear way to remove the transaction id from >> the >>> job >>>>>>> spec and keep the option for MultiTransactionJobletEventLis >>> tenerFactory. >>>>>>> >>>>>>> The question for the group is, do we see a need to maintain this >> class >>>>>> that >>>>>>> will no longer be used by any current code? Or, an other words, is >>> there >>>>>> a >>>>>>> strong possibility that in the future we will want multiple >>> transactions >>>>>> to >>>>>>> share a single Hyracks job, meaning that it is worth figuring out >> how >>> to >>>>>>> maintain this class? >>>>>>> >>>>>>> Steven >>>>>> >>>>>> >>> >>> >>
