>From the conversation, it seems like nobody has the full picture to propose the design? For deployed jobs, the idea is to use the same job specification but create a new Hyracks job and Asterix Transaction for each execution.
Steven On Fri, Nov 17, 2017 at 11:10 AM, abdullah alamoudi <[email protected]> wrote: > I can e-meet anytime (moved to Sunnyvale). We can also look at a proposed > design and see if it can work > Back to my question, how were you planning to change the transaction id if > we forget about the case with multiple datasets (feed job)? > > > > On Nov 17, 2017, at 10:38 AM, Steven Jacobs <[email protected]> wrote: > > > > Maybe it would be good to have a meeting about this with all interested > > parties? > > > > I can be on-campus at UCI on Tuesday if that would be a good day to meet. > > > > Steven > > > > On Fri, Nov 17, 2017 at 9:36 AM, abdullah alamoudi <[email protected]> > > wrote: > > > >> Also, was wondering how would you do the same for a single dataset > >> (non-feed). How would you get the transaction id and change it when you > >> re-run? > >> > >> On Nov 17, 2017 7:12 AM, "Murtadha Hubail" <[email protected]> wrote: > >> > >>> For atomic transactions, the change was merged yesterday. For entity > >> level > >>> transactions, it should be a very small change. > >>> > >>> Cheers, > >>> Murtadha > >>> > >>>> On Nov 17, 2017, at 6:07 PM, abdullah alamoudi <[email protected]> > >>> wrote: > >>>> > >>>> I understand that is not the case right now but what you're working > on? > >>>> > >>>> Cheers, > >>>> Abdullah. > >>>> > >>>> > >>>>> On Nov 17, 2017, at 7:04 AM, Murtadha Hubail <[email protected]> > >>> wrote: > >>>>> > >>>>> A transaction context can register multiple primary indexes. > >>>>> Since each entity commit log contains the dataset id, you can > >> decrement > >>> the active operations on > >>>>> the operation tracker associated with that dataset id. > >>>>> > >>>>> On 17/11/2017, 5:52 PM, "abdullah alamoudi" <[email protected]> > >> wrote: > >>>>> > >>>>> Can you illustrate how a deadlock can happen? I am anxious to know. > >>>>> Moreover, the reason for the multiple transaction ids in feeds is > >> not > >>> simply because we compile them differently. > >>>>> > >>>>> How would a commit operator know which dataset active operation > >>> counter to decrement if they share the same id for example? > >>>>> > >>>>>> On Nov 16, 2017, at 9:46 PM, Xikui Wang <[email protected]> wrote: > >>>>>> > >>>>>> Yes. That deadlock could happen. Currently, we have one-to-one > >>> mappings for > >>>>>> the jobs and transactions, except for the feeds. > >>>>>> > >>>>>> @Abdullah, after some digging into the code, I think probably we can > >>> use a > >>>>>> single transaction id for the job which feeds multiple datasets? See > >>> if I > >>>>>> can convince you. :) > >>>>>> > >>>>>> The reason we have multiple transaction ids in feeds is that we > >> compile > >>>>>> each connection job separately and combine them into a single feed > >>> job. A > >>>>>> new transaction id is created and assigned to each connection job, > >>> thus for > >>>>>> the combined job, we have to handle the different transactions as > >> they > >>>>>> are embedded in the connection job specifications. But, what if we > >>> create a > >>>>>> single transaction id for the combined job? That transaction id will > >> be > >>>>>> embedded into each connection so they can write logs freely, but the > >>>>>> transaction will be started and committed only once as there is only > >>> one > >>>>>> feed job. In this way, we won't need multiTransactionJobletEventLis > >>> tener > >>>>>> and the transaction id can be removed from the job specification > >>> easily as > >>>>>> well (for Steven's change). > >>>>>> > >>>>>> Best, > >>>>>> Xikui > >>>>>> > >>>>>> > >>>>>>> On Thu, Nov 16, 2017 at 4:26 PM, Mike Carey <[email protected]> > >>> wrote: > >>>>>>> > >>>>>>> I worry about deadlocks. The waits for graph may not understand > >> that > >>>>>>> making t1 wait will also make t2 wait since they may share a thread > >> - > >>>>>>> right? Or do we have jobs and transactions separately represented > >>> there > >>>>>>> now? > >>>>>>> > >>>>>>>> On Nov 16, 2017 3:10 PM, "abdullah alamoudi" <[email protected]> > >>> wrote: > >>>>>>>> > >>>>>>>> We are using multiple transactions in a single job in case of feed > >>> and I > >>>>>>>> think that this is the correct way. > >>>>>>>> Having a single job for a feed that feeds into multiple datasets > >> is a > >>>>>>> good > >>>>>>>> thing since job resources/feed resources are consolidated. > >>>>>>>> > >>>>>>>> Here are some points: > >>>>>>>> - We can't use the same transaction id to feed multiple datasets. > >> The > >>>>>>> only > >>>>>>>> other option is to have multiple jobs each feeding a different > >>> dataset. > >>>>>>>> - Having multiple jobs (in addition to the extra resources used, > >>> memory > >>>>>>>> and CPU) would then forces us to either read data from external > >>> sources > >>>>>>>> multiple times, parse records multiple times, etc > >>>>>>>> or having to have a synchronization between the different jobs and > >>> the > >>>>>>>> feed source within asterixdb. IMO, this is far more complicated > >> than > >>>>>>> having > >>>>>>>> multiple transactions within a single job and the cost far > outweigh > >>> the > >>>>>>>> benefits. > >>>>>>>> > >>>>>>>> P.S, > >>>>>>>> We are also using this for bucket connections in Couchbase > >> Analytics. > >>>>>>>> > >>>>>>>>> On Nov 16, 2017, at 2:57 PM, Till Westmann <[email protected]> > >>> wrote: > >>>>>>>>> > >>>>>>>>> If there are a number of issue with supporting multiple > >> transaction > >>> ids > >>>>>>>>> and no clear benefits/use-cases, I’d vote for simplification :) > >>>>>>>>> Also, code that’s not being used has a tendency to "rot" and so I > >>> think > >>>>>>>>> that it’s usefulness might be limited by the time we’d find a use > >>> for > >>>>>>>>> this functionality. > >>>>>>>>> > >>>>>>>>> My 2c, > >>>>>>>>> Till > >>>>>>>>> > >>>>>>>>>> On 16 Nov 2017, at 13:57, Xikui Wang wrote: > >>>>>>>>>> > >>>>>>>>>> I'm separating the connections into different jobs in some of my > >>>>>>>>>> experiments... but that was intended to be used for the > >>> experimental > >>>>>>>>>> settings (i.e., not for master now)... > >>>>>>>>>> > >>>>>>>>>> I think the interesting question here is whether we want to > allow > >>> one > >>>>>>>>>> Hyracks job to carry multiple transactions. I personally think > >> that > >>>>>>>> should > >>>>>>>>>> be allowed as the transaction and job are two separate concepts, > >>> but I > >>>>>>>>>> couldn't find such use cases other than the feeds. Does anyone > >>> have a > >>>>>>>> good > >>>>>>>>>> example on this? > >>>>>>>>>> > >>>>>>>>>> Another question is, if we do allow multiple transactions in a > >>> single > >>>>>>>>>> Hyracks job, how do we enable commit runtime to obtain the > >> correct > >>> TXN > >>>>>>>> id > >>>>>>>>>> without having that embedded as part of the job specification. > >>>>>>>>>> > >>>>>>>>>> Best, > >>>>>>>>>> Xikui > >>>>>>>>>> > >>>>>>>>>> On Thu, Nov 16, 2017 at 1:01 PM, abdullah alamoudi < > >>>>>>> [email protected]> > >>>>>>>>>> wrote: > >>>>>>>>>> > >>>>>>>>>>> I am curious as to how feed will work without this? > >>>>>>>>>>> > >>>>>>>>>>> ~Abdullah. > >>>>>>>>>>>> On Nov 16, 2017, at 12:43 PM, Steven Jacobs <[email protected] > > > >>>>>>> wrote: > >>>>>>>>>>>> > >>>>>>>>>>>> Hi all, > >>>>>>>>>>>> We currently have MultiTransactionJobletEventListenerFactory, > >>> which > >>>>>>>>>>> allows > >>>>>>>>>>>> for one Hyracks job to run multiple Asterix transactions > >>> together. > >>>>>>>>>>>> > >>>>>>>>>>>> This class is only used by feeds, and feeds are in process of > >>>>>>>> changing to > >>>>>>>>>>>> no longer need this feature. As part of the work in > >> pre-deploying > >>>>>>> job > >>>>>>>>>>>> specifications to be used by multiple hyracks jobs, I've been > >>>>>>> working > >>>>>>>> on > >>>>>>>>>>>> removing the transaction id from the job specifications, as we > >>> use a > >>>>>>>> new > >>>>>>>>>>>> transaction for each invocation of a deployed job. > >>>>>>>>>>>> > >>>>>>>>>>>> There is currently no clear way to remove the transaction id > >> from > >>>>>>> the > >>>>>>>> job > >>>>>>>>>>>> spec and keep the option for MultiTransactionJobletEventLis > >>>>>>>> tenerFactory. > >>>>>>>>>>>> > >>>>>>>>>>>> The question for the group is, do we see a need to maintain > >> this > >>>>>>> class > >>>>>>>>>>> that > >>>>>>>>>>>> will no longer be used by any current code? Or, an other > words, > >>> is > >>>>>>>> there > >>>>>>>>>>> a > >>>>>>>>>>>> strong possibility that in the future we will want multiple > >>>>>>>> transactions > >>>>>>>>>>> to > >>>>>>>>>>>> share a single Hyracks job, meaning that it is worth figuring > >> out > >>>>>>> how > >>>>>>>> to > >>>>>>>>>>>> maintain this class? > >>>>>>>>>>>> > >>>>>>>>>>>> Steven > >>>>>>>>>>> > >>>>>>>>>>> > >>>>>>>> > >>>>>>>> > >>>>>>> > >>>>> > >>>>> > >>>>> > >>>>> > >>>> > >>> > >> > >
