If that's true than that solution seems best to me, but we had discussed this earlier and Xikui mentioned that that might not be true. @Xikui? Steven
On Fri, Nov 17, 2017 at 11:55 AM, abdullah alamoudi <[email protected]> wrote: > Right now, they can't, so datasetId can be safely used. > > On Nov 17, 2017, at 11:51 AM, Steven Jacobs <[email protected]> wrote: > > > > For option 1, I think the dataset id is not a unique identifier. Couldn't > > multiple transactions in one job work on the same dataset? > > > > Steven > > > > On Fri, Nov 17, 2017 at 11:38 AM, abdullah alamoudi <[email protected]> > > wrote: > > > >> So, there are three options to do this: > >> 1. Each of these operators work on a a specific dataset. So we can pass > >> the datasetId to the JobEventListenerFactory when requesting the > >> transaction id. > >> 2. We make 1 transaction works for multiple datasets by using a map from > >> datasetId to primary opTracker and use it when reporting commits by the > log > >> flusher thread. > >> 3. Prevent a job from having multiple transactions. (For the record, I > >> dislike this option since the price we pay is very high IMO) > >> > >> Cheers, > >> Abdullah. > >> > >>> On Nov 17, 2017, at 11:32 AM, Steven Jacobs <[email protected]> wrote: > >>> > >>> Well, we've solved the problem when there is only one transaction id > per > >>> job. The operators can fetch the transaction ids from the > >>> JobEventListenerFactory (you can find this in master now). The issue > is, > >>> when we are trying to combine multiple job specs into one feed job, the > >>> operators at runtime don't have a memory of which "job spec" they > >>> originally belonged to which could tell them which one of the > transaction > >>> ids that they should use. > >>> > >>> Steven > >>> > >>> On Fri, Nov 17, 2017 at 11:25 AM, abdullah alamoudi < > [email protected]> > >>> wrote: > >>> > >>>> > >>>> I think that this works and seems like the question is how different > >>>> operators in the job can get their transaction ids. > >>>> > >>>> ~Abdullah. > >>>> > >>>>> On Nov 17, 2017, at 11:21 AM, Steven Jacobs <[email protected]> > wrote: > >>>>> > >>>>> From the conversation, it seems like nobody has the full picture to > >>>> propose > >>>>> the design? > >>>>> For deployed jobs, the idea is to use the same job specification but > >>>> create > >>>>> a new Hyracks job and Asterix Transaction for each execution. > >>>>> > >>>>> Steven > >>>>> > >>>>> On Fri, Nov 17, 2017 at 11:10 AM, abdullah alamoudi < > >> [email protected]> > >>>>> wrote: > >>>>> > >>>>>> I can e-meet anytime (moved to Sunnyvale). We can also look at a > >>>> proposed > >>>>>> design and see if it can work > >>>>>> Back to my question, how were you planning to change the transaction > >> id > >>>> if > >>>>>> we forget about the case with multiple datasets (feed job)? > >>>>>> > >>>>>> > >>>>>>> On Nov 17, 2017, at 10:38 AM, Steven Jacobs <[email protected]> > >> wrote: > >>>>>>> > >>>>>>> Maybe it would be good to have a meeting about this with all > >> interested > >>>>>>> parties? > >>>>>>> > >>>>>>> I can be on-campus at UCI on Tuesday if that would be a good day to > >>>> meet. > >>>>>>> > >>>>>>> Steven > >>>>>>> > >>>>>>> On Fri, Nov 17, 2017 at 9:36 AM, abdullah alamoudi < > >> [email protected] > >>>>> > >>>>>>> wrote: > >>>>>>> > >>>>>>>> Also, was wondering how would you do the same for a single dataset > >>>>>>>> (non-feed). How would you get the transaction id and change it > when > >>>> you > >>>>>>>> re-run? > >>>>>>>> > >>>>>>>> On Nov 17, 2017 7:12 AM, "Murtadha Hubail" <[email protected]> > >>>> wrote: > >>>>>>>> > >>>>>>>>> For atomic transactions, the change was merged yesterday. For > >> entity > >>>>>>>> level > >>>>>>>>> transactions, it should be a very small change. > >>>>>>>>> > >>>>>>>>> Cheers, > >>>>>>>>> Murtadha > >>>>>>>>> > >>>>>>>>>> On Nov 17, 2017, at 6:07 PM, abdullah alamoudi < > >> [email protected]> > >>>>>>>>> wrote: > >>>>>>>>>> > >>>>>>>>>> I understand that is not the case right now but what you're > >> working > >>>>>> on? > >>>>>>>>>> > >>>>>>>>>> Cheers, > >>>>>>>>>> Abdullah. > >>>>>>>>>> > >>>>>>>>>> > >>>>>>>>>>> On Nov 17, 2017, at 7:04 AM, Murtadha Hubail < > >> [email protected]> > >>>>>>>>> wrote: > >>>>>>>>>>> > >>>>>>>>>>> A transaction context can register multiple primary indexes. > >>>>>>>>>>> Since each entity commit log contains the dataset id, you can > >>>>>>>> decrement > >>>>>>>>> the active operations on > >>>>>>>>>>> the operation tracker associated with that dataset id. > >>>>>>>>>>> > >>>>>>>>>>> On 17/11/2017, 5:52 PM, "abdullah alamoudi" < > [email protected]> > >>>>>>>> wrote: > >>>>>>>>>>> > >>>>>>>>>>> Can you illustrate how a deadlock can happen? I am anxious to > >> know. > >>>>>>>>>>> Moreover, the reason for the multiple transaction ids in feeds > is > >>>>>>>> not > >>>>>>>>> simply because we compile them differently. > >>>>>>>>>>> > >>>>>>>>>>> How would a commit operator know which dataset active operation > >>>>>>>>> counter to decrement if they share the same id for example? > >>>>>>>>>>> > >>>>>>>>>>>> On Nov 16, 2017, at 9:46 PM, Xikui Wang <[email protected]> > wrote: > >>>>>>>>>>>> > >>>>>>>>>>>> Yes. That deadlock could happen. Currently, we have one-to-one > >>>>>>>>> mappings for > >>>>>>>>>>>> the jobs and transactions, except for the feeds. > >>>>>>>>>>>> > >>>>>>>>>>>> @Abdullah, after some digging into the code, I think probably > we > >>>> can > >>>>>>>>> use a > >>>>>>>>>>>> single transaction id for the job which feeds multiple > datasets? > >>>> See > >>>>>>>>> if I > >>>>>>>>>>>> can convince you. :) > >>>>>>>>>>>> > >>>>>>>>>>>> The reason we have multiple transaction ids in feeds is that > we > >>>>>>>> compile > >>>>>>>>>>>> each connection job separately and combine them into a single > >> feed > >>>>>>>>> job. A > >>>>>>>>>>>> new transaction id is created and assigned to each connection > >> job, > >>>>>>>>> thus for > >>>>>>>>>>>> the combined job, we have to handle the different transactions > >> as > >>>>>>>> they > >>>>>>>>>>>> are embedded in the connection job specifications. But, what > if > >> we > >>>>>>>>> create a > >>>>>>>>>>>> single transaction id for the combined job? That transaction > id > >>>> will > >>>>>>>> be > >>>>>>>>>>>> embedded into each connection so they can write logs freely, > but > >>>> the > >>>>>>>>>>>> transaction will be started and committed only once as there > is > >>>> only > >>>>>>>>> one > >>>>>>>>>>>> feed job. In this way, we won't need > >>>> multiTransactionJobletEventLis > >>>>>>>>> tener > >>>>>>>>>>>> and the transaction id can be removed from the job > specification > >>>>>>>>> easily as > >>>>>>>>>>>> well (for Steven's change). > >>>>>>>>>>>> > >>>>>>>>>>>> Best, > >>>>>>>>>>>> Xikui > >>>>>>>>>>>> > >>>>>>>>>>>> > >>>>>>>>>>>>> On Thu, Nov 16, 2017 at 4:26 PM, Mike Carey < > [email protected] > >>> > >>>>>>>>> wrote: > >>>>>>>>>>>>> > >>>>>>>>>>>>> I worry about deadlocks. The waits for graph may not > >> understand > >>>>>>>> that > >>>>>>>>>>>>> making t1 wait will also make t2 wait since they may share a > >>>> thread > >>>>>>>> - > >>>>>>>>>>>>> right? Or do we have jobs and transactions separately > >>>> represented > >>>>>>>>> there > >>>>>>>>>>>>> now? > >>>>>>>>>>>>> > >>>>>>>>>>>>>> On Nov 16, 2017 3:10 PM, "abdullah alamoudi" < > >>>> [email protected]> > >>>>>>>>> wrote: > >>>>>>>>>>>>>> > >>>>>>>>>>>>>> We are using multiple transactions in a single job in case > of > >>>> feed > >>>>>>>>> and I > >>>>>>>>>>>>>> think that this is the correct way. > >>>>>>>>>>>>>> Having a single job for a feed that feeds into multiple > >> datasets > >>>>>>>> is a > >>>>>>>>>>>>> good > >>>>>>>>>>>>>> thing since job resources/feed resources are consolidated. > >>>>>>>>>>>>>> > >>>>>>>>>>>>>> Here are some points: > >>>>>>>>>>>>>> - We can't use the same transaction id to feed multiple > >>>> datasets. > >>>>>>>> The > >>>>>>>>>>>>> only > >>>>>>>>>>>>>> other option is to have multiple jobs each feeding a > different > >>>>>>>>> dataset. > >>>>>>>>>>>>>> - Having multiple jobs (in addition to the extra resources > >> used, > >>>>>>>>> memory > >>>>>>>>>>>>>> and CPU) would then forces us to either read data from > >> external > >>>>>>>>> sources > >>>>>>>>>>>>>> multiple times, parse records multiple times, etc > >>>>>>>>>>>>>> or having to have a synchronization between the different > jobs > >>>> and > >>>>>>>>> the > >>>>>>>>>>>>>> feed source within asterixdb. IMO, this is far more > >> complicated > >>>>>>>> than > >>>>>>>>>>>>> having > >>>>>>>>>>>>>> multiple transactions within a single job and the cost far > >>>>>> outweigh > >>>>>>>>> the > >>>>>>>>>>>>>> benefits. > >>>>>>>>>>>>>> > >>>>>>>>>>>>>> P.S, > >>>>>>>>>>>>>> We are also using this for bucket connections in Couchbase > >>>>>>>> Analytics. > >>>>>>>>>>>>>> > >>>>>>>>>>>>>>> On Nov 16, 2017, at 2:57 PM, Till Westmann < > [email protected] > >>> > >>>>>>>>> wrote: > >>>>>>>>>>>>>>> > >>>>>>>>>>>>>>> If there are a number of issue with supporting multiple > >>>>>>>> transaction > >>>>>>>>> ids > >>>>>>>>>>>>>>> and no clear benefits/use-cases, I’d vote for > simplification > >> :) > >>>>>>>>>>>>>>> Also, code that’s not being used has a tendency to "rot" > and > >>>> so I > >>>>>>>>> think > >>>>>>>>>>>>>>> that it’s usefulness might be limited by the time we’d > find a > >>>> use > >>>>>>>>> for > >>>>>>>>>>>>>>> this functionality. > >>>>>>>>>>>>>>> > >>>>>>>>>>>>>>> My 2c, > >>>>>>>>>>>>>>> Till > >>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>> On 16 Nov 2017, at 13:57, Xikui Wang wrote: > >>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>> I'm separating the connections into different jobs in some > >> of > >>>> my > >>>>>>>>>>>>>>>> experiments... but that was intended to be used for the > >>>>>>>>> experimental > >>>>>>>>>>>>>>>> settings (i.e., not for master now)... > >>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>> I think the interesting question here is whether we want > to > >>>>>> allow > >>>>>>>>> one > >>>>>>>>>>>>>>>> Hyracks job to carry multiple transactions. I personally > >> think > >>>>>>>> that > >>>>>>>>>>>>>> should > >>>>>>>>>>>>>>>> be allowed as the transaction and job are two separate > >>>> concepts, > >>>>>>>>> but I > >>>>>>>>>>>>>>>> couldn't find such use cases other than the feeds. Does > >> anyone > >>>>>>>>> have a > >>>>>>>>>>>>>> good > >>>>>>>>>>>>>>>> example on this? > >>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>> Another question is, if we do allow multiple transactions > >> in a > >>>>>>>>> single > >>>>>>>>>>>>>>>> Hyracks job, how do we enable commit runtime to obtain the > >>>>>>>> correct > >>>>>>>>> TXN > >>>>>>>>>>>>>> id > >>>>>>>>>>>>>>>> without having that embedded as part of the job > >> specification. > >>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>> Best, > >>>>>>>>>>>>>>>> Xikui > >>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>> On Thu, Nov 16, 2017 at 1:01 PM, abdullah alamoudi < > >>>>>>>>>>>>> [email protected]> > >>>>>>>>>>>>>>>> wrote: > >>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>>> I am curious as to how feed will work without this? > >>>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>>> ~Abdullah. > >>>>>>>>>>>>>>>>>> On Nov 16, 2017, at 12:43 PM, Steven Jacobs < > >>>> [email protected] > >>>>>>> > >>>>>>>>>>>>> wrote: > >>>>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>>>> Hi all, > >>>>>>>>>>>>>>>>>> We currently have MultiTransactionJobletEventLis > >>>> tenerFactory, > >>>>>>>>> which > >>>>>>>>>>>>>>>>> allows > >>>>>>>>>>>>>>>>>> for one Hyracks job to run multiple Asterix transactions > >>>>>>>>> together. > >>>>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>>>> This class is only used by feeds, and feeds are in > process > >>>> of > >>>>>>>>>>>>>> changing to > >>>>>>>>>>>>>>>>>> no longer need this feature. As part of the work in > >>>>>>>> pre-deploying > >>>>>>>>>>>>> job > >>>>>>>>>>>>>>>>>> specifications to be used by multiple hyracks jobs, I've > >>>> been > >>>>>>>>>>>>> working > >>>>>>>>>>>>>> on > >>>>>>>>>>>>>>>>>> removing the transaction id from the job specifications, > >> as > >>>> we > >>>>>>>>> use a > >>>>>>>>>>>>>> new > >>>>>>>>>>>>>>>>>> transaction for each invocation of a deployed job. > >>>>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>>>> There is currently no clear way to remove the > transaction > >> id > >>>>>>>> from > >>>>>>>>>>>>> the > >>>>>>>>>>>>>> job > >>>>>>>>>>>>>>>>>> spec and keep the option for > >> MultiTransactionJobletEventLis > >>>>>>>>>>>>>> tenerFactory. > >>>>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>>>> The question for the group is, do we see a need to > >> maintain > >>>>>>>> this > >>>>>>>>>>>>> class > >>>>>>>>>>>>>>>>> that > >>>>>>>>>>>>>>>>>> will no longer be used by any current code? Or, an other > >>>>>> words, > >>>>>>>>> is > >>>>>>>>>>>>>> there > >>>>>>>>>>>>>>>>> a > >>>>>>>>>>>>>>>>>> strong possibility that in the future we will want > >> multiple > >>>>>>>>>>>>>> transactions > >>>>>>>>>>>>>>>>> to > >>>>>>>>>>>>>>>>>> share a single Hyracks job, meaning that it is worth > >>>> figuring > >>>>>>>> out > >>>>>>>>>>>>> how > >>>>>>>>>>>>>> to > >>>>>>>>>>>>>>>>>> maintain this class? > >>>>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>>>> Steven > >>>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>>> > >>>>>>>>>>>>>> > >>>>>>>>>>>>>> > >>>>>>>>>>>>> > >>>>>>>>>>> > >>>>>>>>>>> > >>>>>>>>>>> > >>>>>>>>>>> > >>>>>>>>>> > >>>>>>>>> > >>>>>>>> > >>>>>> > >>>>>> > >>>> > >>>> > >> > >> > >
