I think that this works and seems like the question is how different operators in the job can get their transaction ids.
~Abdullah. > On Nov 17, 2017, at 11:21 AM, Steven Jacobs <[email protected]> wrote: > > From the conversation, it seems like nobody has the full picture to propose > the design? > For deployed jobs, the idea is to use the same job specification but create > a new Hyracks job and Asterix Transaction for each execution. > > Steven > > On Fri, Nov 17, 2017 at 11:10 AM, abdullah alamoudi <[email protected]> > wrote: > >> I can e-meet anytime (moved to Sunnyvale). We can also look at a proposed >> design and see if it can work >> Back to my question, how were you planning to change the transaction id if >> we forget about the case with multiple datasets (feed job)? >> >> >>> On Nov 17, 2017, at 10:38 AM, Steven Jacobs <[email protected]> wrote: >>> >>> Maybe it would be good to have a meeting about this with all interested >>> parties? >>> >>> I can be on-campus at UCI on Tuesday if that would be a good day to meet. >>> >>> Steven >>> >>> On Fri, Nov 17, 2017 at 9:36 AM, abdullah alamoudi <[email protected]> >>> wrote: >>> >>>> Also, was wondering how would you do the same for a single dataset >>>> (non-feed). How would you get the transaction id and change it when you >>>> re-run? >>>> >>>> On Nov 17, 2017 7:12 AM, "Murtadha Hubail" <[email protected]> wrote: >>>> >>>>> For atomic transactions, the change was merged yesterday. For entity >>>> level >>>>> transactions, it should be a very small change. >>>>> >>>>> Cheers, >>>>> Murtadha >>>>> >>>>>> On Nov 17, 2017, at 6:07 PM, abdullah alamoudi <[email protected]> >>>>> wrote: >>>>>> >>>>>> I understand that is not the case right now but what you're working >> on? >>>>>> >>>>>> Cheers, >>>>>> Abdullah. >>>>>> >>>>>> >>>>>>> On Nov 17, 2017, at 7:04 AM, Murtadha Hubail <[email protected]> >>>>> wrote: >>>>>>> >>>>>>> A transaction context can register multiple primary indexes. >>>>>>> Since each entity commit log contains the dataset id, you can >>>> decrement >>>>> the active operations on >>>>>>> the operation tracker associated with that dataset id. >>>>>>> >>>>>>> On 17/11/2017, 5:52 PM, "abdullah alamoudi" <[email protected]> >>>> wrote: >>>>>>> >>>>>>> Can you illustrate how a deadlock can happen? I am anxious to know. >>>>>>> Moreover, the reason for the multiple transaction ids in feeds is >>>> not >>>>> simply because we compile them differently. >>>>>>> >>>>>>> How would a commit operator know which dataset active operation >>>>> counter to decrement if they share the same id for example? >>>>>>> >>>>>>>> On Nov 16, 2017, at 9:46 PM, Xikui Wang <[email protected]> wrote: >>>>>>>> >>>>>>>> Yes. That deadlock could happen. Currently, we have one-to-one >>>>> mappings for >>>>>>>> the jobs and transactions, except for the feeds. >>>>>>>> >>>>>>>> @Abdullah, after some digging into the code, I think probably we can >>>>> use a >>>>>>>> single transaction id for the job which feeds multiple datasets? See >>>>> if I >>>>>>>> can convince you. :) >>>>>>>> >>>>>>>> The reason we have multiple transaction ids in feeds is that we >>>> compile >>>>>>>> each connection job separately and combine them into a single feed >>>>> job. A >>>>>>>> new transaction id is created and assigned to each connection job, >>>>> thus for >>>>>>>> the combined job, we have to handle the different transactions as >>>> they >>>>>>>> are embedded in the connection job specifications. But, what if we >>>>> create a >>>>>>>> single transaction id for the combined job? That transaction id will >>>> be >>>>>>>> embedded into each connection so they can write logs freely, but the >>>>>>>> transaction will be started and committed only once as there is only >>>>> one >>>>>>>> feed job. In this way, we won't need multiTransactionJobletEventLis >>>>> tener >>>>>>>> and the transaction id can be removed from the job specification >>>>> easily as >>>>>>>> well (for Steven's change). >>>>>>>> >>>>>>>> Best, >>>>>>>> Xikui >>>>>>>> >>>>>>>> >>>>>>>>> On Thu, Nov 16, 2017 at 4:26 PM, Mike Carey <[email protected]> >>>>> wrote: >>>>>>>>> >>>>>>>>> I worry about deadlocks. The waits for graph may not understand >>>> that >>>>>>>>> making t1 wait will also make t2 wait since they may share a thread >>>> - >>>>>>>>> right? Or do we have jobs and transactions separately represented >>>>> there >>>>>>>>> now? >>>>>>>>> >>>>>>>>>> On Nov 16, 2017 3:10 PM, "abdullah alamoudi" <[email protected]> >>>>> wrote: >>>>>>>>>> >>>>>>>>>> We are using multiple transactions in a single job in case of feed >>>>> and I >>>>>>>>>> think that this is the correct way. >>>>>>>>>> Having a single job for a feed that feeds into multiple datasets >>>> is a >>>>>>>>> good >>>>>>>>>> thing since job resources/feed resources are consolidated. >>>>>>>>>> >>>>>>>>>> Here are some points: >>>>>>>>>> - We can't use the same transaction id to feed multiple datasets. >>>> The >>>>>>>>> only >>>>>>>>>> other option is to have multiple jobs each feeding a different >>>>> dataset. >>>>>>>>>> - Having multiple jobs (in addition to the extra resources used, >>>>> memory >>>>>>>>>> and CPU) would then forces us to either read data from external >>>>> sources >>>>>>>>>> multiple times, parse records multiple times, etc >>>>>>>>>> or having to have a synchronization between the different jobs and >>>>> the >>>>>>>>>> feed source within asterixdb. IMO, this is far more complicated >>>> than >>>>>>>>> having >>>>>>>>>> multiple transactions within a single job and the cost far >> outweigh >>>>> the >>>>>>>>>> benefits. >>>>>>>>>> >>>>>>>>>> P.S, >>>>>>>>>> We are also using this for bucket connections in Couchbase >>>> Analytics. >>>>>>>>>> >>>>>>>>>>> On Nov 16, 2017, at 2:57 PM, Till Westmann <[email protected]> >>>>> wrote: >>>>>>>>>>> >>>>>>>>>>> If there are a number of issue with supporting multiple >>>> transaction >>>>> ids >>>>>>>>>>> and no clear benefits/use-cases, I’d vote for simplification :) >>>>>>>>>>> Also, code that’s not being used has a tendency to "rot" and so I >>>>> think >>>>>>>>>>> that it’s usefulness might be limited by the time we’d find a use >>>>> for >>>>>>>>>>> this functionality. >>>>>>>>>>> >>>>>>>>>>> My 2c, >>>>>>>>>>> Till >>>>>>>>>>> >>>>>>>>>>>> On 16 Nov 2017, at 13:57, Xikui Wang wrote: >>>>>>>>>>>> >>>>>>>>>>>> I'm separating the connections into different jobs in some of my >>>>>>>>>>>> experiments... but that was intended to be used for the >>>>> experimental >>>>>>>>>>>> settings (i.e., not for master now)... >>>>>>>>>>>> >>>>>>>>>>>> I think the interesting question here is whether we want to >> allow >>>>> one >>>>>>>>>>>> Hyracks job to carry multiple transactions. I personally think >>>> that >>>>>>>>>> should >>>>>>>>>>>> be allowed as the transaction and job are two separate concepts, >>>>> but I >>>>>>>>>>>> couldn't find such use cases other than the feeds. Does anyone >>>>> have a >>>>>>>>>> good >>>>>>>>>>>> example on this? >>>>>>>>>>>> >>>>>>>>>>>> Another question is, if we do allow multiple transactions in a >>>>> single >>>>>>>>>>>> Hyracks job, how do we enable commit runtime to obtain the >>>> correct >>>>> TXN >>>>>>>>>> id >>>>>>>>>>>> without having that embedded as part of the job specification. >>>>>>>>>>>> >>>>>>>>>>>> Best, >>>>>>>>>>>> Xikui >>>>>>>>>>>> >>>>>>>>>>>> On Thu, Nov 16, 2017 at 1:01 PM, abdullah alamoudi < >>>>>>>>> [email protected]> >>>>>>>>>>>> wrote: >>>>>>>>>>>> >>>>>>>>>>>>> I am curious as to how feed will work without this? >>>>>>>>>>>>> >>>>>>>>>>>>> ~Abdullah. >>>>>>>>>>>>>> On Nov 16, 2017, at 12:43 PM, Steven Jacobs <[email protected] >>> >>>>>>>>> wrote: >>>>>>>>>>>>>> >>>>>>>>>>>>>> Hi all, >>>>>>>>>>>>>> We currently have MultiTransactionJobletEventListenerFactory, >>>>> which >>>>>>>>>>>>> allows >>>>>>>>>>>>>> for one Hyracks job to run multiple Asterix transactions >>>>> together. >>>>>>>>>>>>>> >>>>>>>>>>>>>> This class is only used by feeds, and feeds are in process of >>>>>>>>>> changing to >>>>>>>>>>>>>> no longer need this feature. As part of the work in >>>> pre-deploying >>>>>>>>> job >>>>>>>>>>>>>> specifications to be used by multiple hyracks jobs, I've been >>>>>>>>> working >>>>>>>>>> on >>>>>>>>>>>>>> removing the transaction id from the job specifications, as we >>>>> use a >>>>>>>>>> new >>>>>>>>>>>>>> transaction for each invocation of a deployed job. >>>>>>>>>>>>>> >>>>>>>>>>>>>> There is currently no clear way to remove the transaction id >>>> from >>>>>>>>> the >>>>>>>>>> job >>>>>>>>>>>>>> spec and keep the option for MultiTransactionJobletEventLis >>>>>>>>>> tenerFactory. >>>>>>>>>>>>>> >>>>>>>>>>>>>> The question for the group is, do we see a need to maintain >>>> this >>>>>>>>> class >>>>>>>>>>>>> that >>>>>>>>>>>>>> will no longer be used by any current code? Or, an other >> words, >>>>> is >>>>>>>>>> there >>>>>>>>>>>>> a >>>>>>>>>>>>>> strong possibility that in the future we will want multiple >>>>>>>>>> transactions >>>>>>>>>>>>> to >>>>>>>>>>>>>> share a single Hyracks job, meaning that it is worth figuring >>>> out >>>>>>>>> how >>>>>>>>>> to >>>>>>>>>>>>>> maintain this class? >>>>>>>>>>>>>> >>>>>>>>>>>>>> Steven >>>>>>>>>>>>> >>>>>>>>>>>>> >>>>>>>>>> >>>>>>>>>> >>>>>>>>> >>>>>>> >>>>>>> >>>>>>> >>>>>>> >>>>>> >>>>> >>>> >> >>
