Maybe it would be good to have a meeting about this with all interested parties?
I can be on-campus at UCI on Tuesday if that would be a good day to meet. Steven On Fri, Nov 17, 2017 at 9:36 AM, abdullah alamoudi <[email protected]> wrote: > Also, was wondering how would you do the same for a single dataset > (non-feed). How would you get the transaction id and change it when you > re-run? > > On Nov 17, 2017 7:12 AM, "Murtadha Hubail" <[email protected]> wrote: > > > For atomic transactions, the change was merged yesterday. For entity > level > > transactions, it should be a very small change. > > > > Cheers, > > Murtadha > > > > > On Nov 17, 2017, at 6:07 PM, abdullah alamoudi <[email protected]> > > wrote: > > > > > > I understand that is not the case right now but what you're working on? > > > > > > Cheers, > > > Abdullah. > > > > > > > > >> On Nov 17, 2017, at 7:04 AM, Murtadha Hubail <[email protected]> > > wrote: > > >> > > >> A transaction context can register multiple primary indexes. > > >> Since each entity commit log contains the dataset id, you can > decrement > > the active operations on > > >> the operation tracker associated with that dataset id. > > >> > > >> On 17/11/2017, 5:52 PM, "abdullah alamoudi" <[email protected]> > wrote: > > >> > > >> Can you illustrate how a deadlock can happen? I am anxious to know. > > >> Moreover, the reason for the multiple transaction ids in feeds is > not > > simply because we compile them differently. > > >> > > >> How would a commit operator know which dataset active operation > > counter to decrement if they share the same id for example? > > >> > > >>> On Nov 16, 2017, at 9:46 PM, Xikui Wang <[email protected]> wrote: > > >>> > > >>> Yes. That deadlock could happen. Currently, we have one-to-one > > mappings for > > >>> the jobs and transactions, except for the feeds. > > >>> > > >>> @Abdullah, after some digging into the code, I think probably we can > > use a > > >>> single transaction id for the job which feeds multiple datasets? See > > if I > > >>> can convince you. :) > > >>> > > >>> The reason we have multiple transaction ids in feeds is that we > compile > > >>> each connection job separately and combine them into a single feed > > job. A > > >>> new transaction id is created and assigned to each connection job, > > thus for > > >>> the combined job, we have to handle the different transactions as > they > > >>> are embedded in the connection job specifications. But, what if we > > create a > > >>> single transaction id for the combined job? That transaction id will > be > > >>> embedded into each connection so they can write logs freely, but the > > >>> transaction will be started and committed only once as there is only > > one > > >>> feed job. In this way, we won't need multiTransactionJobletEventLis > > tener > > >>> and the transaction id can be removed from the job specification > > easily as > > >>> well (for Steven's change). > > >>> > > >>> Best, > > >>> Xikui > > >>> > > >>> > > >>>> On Thu, Nov 16, 2017 at 4:26 PM, Mike Carey <[email protected]> > > wrote: > > >>>> > > >>>> I worry about deadlocks. The waits for graph may not understand > that > > >>>> making t1 wait will also make t2 wait since they may share a thread > - > > >>>> right? Or do we have jobs and transactions separately represented > > there > > >>>> now? > > >>>> > > >>>>> On Nov 16, 2017 3:10 PM, "abdullah alamoudi" <[email protected]> > > wrote: > > >>>>> > > >>>>> We are using multiple transactions in a single job in case of feed > > and I > > >>>>> think that this is the correct way. > > >>>>> Having a single job for a feed that feeds into multiple datasets > is a > > >>>> good > > >>>>> thing since job resources/feed resources are consolidated. > > >>>>> > > >>>>> Here are some points: > > >>>>> - We can't use the same transaction id to feed multiple datasets. > The > > >>>> only > > >>>>> other option is to have multiple jobs each feeding a different > > dataset. > > >>>>> - Having multiple jobs (in addition to the extra resources used, > > memory > > >>>>> and CPU) would then forces us to either read data from external > > sources > > >>>>> multiple times, parse records multiple times, etc > > >>>>> or having to have a synchronization between the different jobs and > > the > > >>>>> feed source within asterixdb. IMO, this is far more complicated > than > > >>>> having > > >>>>> multiple transactions within a single job and the cost far outweigh > > the > > >>>>> benefits. > > >>>>> > > >>>>> P.S, > > >>>>> We are also using this for bucket connections in Couchbase > Analytics. > > >>>>> > > >>>>>> On Nov 16, 2017, at 2:57 PM, Till Westmann <[email protected]> > > wrote: > > >>>>>> > > >>>>>> If there are a number of issue with supporting multiple > transaction > > ids > > >>>>>> and no clear benefits/use-cases, I’d vote for simplification :) > > >>>>>> Also, code that’s not being used has a tendency to "rot" and so I > > think > > >>>>>> that it’s usefulness might be limited by the time we’d find a use > > for > > >>>>>> this functionality. > > >>>>>> > > >>>>>> My 2c, > > >>>>>> Till > > >>>>>> > > >>>>>>> On 16 Nov 2017, at 13:57, Xikui Wang wrote: > > >>>>>>> > > >>>>>>> I'm separating the connections into different jobs in some of my > > >>>>>>> experiments... but that was intended to be used for the > > experimental > > >>>>>>> settings (i.e., not for master now)... > > >>>>>>> > > >>>>>>> I think the interesting question here is whether we want to allow > > one > > >>>>>>> Hyracks job to carry multiple transactions. I personally think > that > > >>>>> should > > >>>>>>> be allowed as the transaction and job are two separate concepts, > > but I > > >>>>>>> couldn't find such use cases other than the feeds. Does anyone > > have a > > >>>>> good > > >>>>>>> example on this? > > >>>>>>> > > >>>>>>> Another question is, if we do allow multiple transactions in a > > single > > >>>>>>> Hyracks job, how do we enable commit runtime to obtain the > correct > > TXN > > >>>>> id > > >>>>>>> without having that embedded as part of the job specification. > > >>>>>>> > > >>>>>>> Best, > > >>>>>>> Xikui > > >>>>>>> > > >>>>>>> On Thu, Nov 16, 2017 at 1:01 PM, abdullah alamoudi < > > >>>> [email protected]> > > >>>>>>> wrote: > > >>>>>>> > > >>>>>>>> I am curious as to how feed will work without this? > > >>>>>>>> > > >>>>>>>> ~Abdullah. > > >>>>>>>>> On Nov 16, 2017, at 12:43 PM, Steven Jacobs <[email protected]> > > >>>> wrote: > > >>>>>>>>> > > >>>>>>>>> Hi all, > > >>>>>>>>> We currently have MultiTransactionJobletEventListenerFactory, > > which > > >>>>>>>> allows > > >>>>>>>>> for one Hyracks job to run multiple Asterix transactions > > together. > > >>>>>>>>> > > >>>>>>>>> This class is only used by feeds, and feeds are in process of > > >>>>> changing to > > >>>>>>>>> no longer need this feature. As part of the work in > pre-deploying > > >>>> job > > >>>>>>>>> specifications to be used by multiple hyracks jobs, I've been > > >>>> working > > >>>>> on > > >>>>>>>>> removing the transaction id from the job specifications, as we > > use a > > >>>>> new > > >>>>>>>>> transaction for each invocation of a deployed job. > > >>>>>>>>> > > >>>>>>>>> There is currently no clear way to remove the transaction id > from > > >>>> the > > >>>>> job > > >>>>>>>>> spec and keep the option for MultiTransactionJobletEventLis > > >>>>> tenerFactory. > > >>>>>>>>> > > >>>>>>>>> The question for the group is, do we see a need to maintain > this > > >>>> class > > >>>>>>>> that > > >>>>>>>>> will no longer be used by any current code? Or, an other words, > > is > > >>>>> there > > >>>>>>>> a > > >>>>>>>>> strong possibility that in the future we will want multiple > > >>>>> transactions > > >>>>>>>> to > > >>>>>>>>> share a single Hyracks job, meaning that it is worth figuring > out > > >>>> how > > >>>>> to > > >>>>>>>>> maintain this class? > > >>>>>>>>> > > >>>>>>>>> Steven > > >>>>>>>> > > >>>>>>>> > > >>>>> > > >>>>> > > >>>> > > >> > > >> > > >> > > >> > > > > > >
