Re: MultiTransactionJobletEventListenerFactory

Steven Jacobs Fri, 17 Nov 2017 11:58:35 -0800

If that's true than that solution seems best to me, but we had discussed
this earlier and Xikui mentioned that that might not be true.
@Xikui?
Steven


On Fri, Nov 17, 2017 at 11:55 AM, abdullah alamoudi <[email protected]>
wrote:

> Right now, they can't, so datasetId can be safely used.
> > On Nov 17, 2017, at 11:51 AM, Steven Jacobs <[email protected]> wrote:
> >
> > For option 1, I think the dataset id is not a unique identifier. Couldn't
> > multiple transactions in one job work on the same dataset?
> >
> > Steven
> >
> > On Fri, Nov 17, 2017 at 11:38 AM, abdullah alamoudi <[email protected]>
> > wrote:
> >
> >> So, there are three options to do this:
> >> 1. Each of these operators work on a a specific dataset. So we can pass
> >> the datasetId to the JobEventListenerFactory when requesting the
> >> transaction id.
> >> 2. We make 1 transaction works for multiple datasets by using a map from
> >> datasetId to primary opTracker and use it when reporting commits by the
> log
> >> flusher thread.
> >> 3. Prevent a job from having multiple transactions. (For the record, I
> >> dislike this option since the price we pay is very high IMO)
> >>
> >> Cheers,
> >> Abdullah.
> >>
> >>> On Nov 17, 2017, at 11:32 AM, Steven Jacobs <[email protected]> wrote:
> >>>
> >>> Well, we've solved the problem when there is only one transaction id
> per
> >>> job. The operators can fetch the transaction ids from the
> >>> JobEventListenerFactory (you can find this in master now). The issue
> is,
> >>> when we are trying to combine multiple job specs into one feed job, the
> >>> operators at runtime don't have a memory of which "job spec" they
> >>> originally belonged to which could tell them which one of the
> transaction
> >>> ids that they should use.
> >>>
> >>> Steven
> >>>
> >>> On Fri, Nov 17, 2017 at 11:25 AM, abdullah alamoudi <
> [email protected]>
> >>> wrote:
> >>>
> >>>>
> >>>> I think that this works and seems like the question is how different
> >>>> operators in the job can get their transaction ids.
> >>>>
> >>>> ~Abdullah.
> >>>>
> >>>>> On Nov 17, 2017, at 11:21 AM, Steven Jacobs <[email protected]>
> wrote:
> >>>>>
> >>>>> From the conversation, it seems like nobody has the full picture to
> >>>> propose
> >>>>> the design?
> >>>>> For deployed jobs, the idea is to use the same job specification but
> >>>> create
> >>>>> a new Hyracks job and Asterix Transaction for each execution.
> >>>>>
> >>>>> Steven
> >>>>>
> >>>>> On Fri, Nov 17, 2017 at 11:10 AM, abdullah alamoudi <
> >> [email protected]>
> >>>>> wrote:
> >>>>>
> >>>>>> I can e-meet anytime (moved to Sunnyvale). We can also look at a
> >>>> proposed
> >>>>>> design and see if it can work
> >>>>>> Back to my question, how were you planning to change the transaction
> >> id
> >>>> if
> >>>>>> we forget about the case with multiple datasets (feed job)?
> >>>>>>
> >>>>>>
> >>>>>>> On Nov 17, 2017, at 10:38 AM, Steven Jacobs <[email protected]>
> >> wrote:
> >>>>>>>
> >>>>>>> Maybe it would be good to have a meeting about this with all
> >> interested
> >>>>>>> parties?
> >>>>>>>
> >>>>>>> I can be on-campus at UCI on Tuesday if that would be a good day to
> >>>> meet.
> >>>>>>>
> >>>>>>> Steven
> >>>>>>>
> >>>>>>> On Fri, Nov 17, 2017 at 9:36 AM, abdullah alamoudi <
> >> [email protected]
> >>>>>
> >>>>>>> wrote:
> >>>>>>>
> >>>>>>>> Also, was wondering how would you do the same for a single dataset
> >>>>>>>> (non-feed). How would you get the transaction id and change it
> when
> >>>> you
> >>>>>>>> re-run?
> >>>>>>>>
> >>>>>>>> On Nov 17, 2017 7:12 AM, "Murtadha Hubail" <[email protected]>
> >>>> wrote:
> >>>>>>>>
> >>>>>>>>> For atomic transactions, the change was merged yesterday. For
> >> entity
> >>>>>>>> level
> >>>>>>>>> transactions, it should be a very small change.
> >>>>>>>>>
> >>>>>>>>> Cheers,
> >>>>>>>>> Murtadha
> >>>>>>>>>
> >>>>>>>>>> On Nov 17, 2017, at 6:07 PM, abdullah alamoudi <
> >> [email protected]>
> >>>>>>>>> wrote:
> >>>>>>>>>>
> >>>>>>>>>> I understand that is not the case right now but what you're
> >> working
> >>>>>> on?
> >>>>>>>>>>
> >>>>>>>>>> Cheers,
> >>>>>>>>>> Abdullah.
> >>>>>>>>>>
> >>>>>>>>>>
> >>>>>>>>>>> On Nov 17, 2017, at 7:04 AM, Murtadha Hubail <
> >> [email protected]>
> >>>>>>>>> wrote:
> >>>>>>>>>>>
> >>>>>>>>>>> A transaction context can register multiple primary indexes.
> >>>>>>>>>>> Since each entity commit log contains the dataset id, you can
> >>>>>>>> decrement
> >>>>>>>>> the active operations on
> >>>>>>>>>>> the operation tracker associated with that dataset id.
> >>>>>>>>>>>
> >>>>>>>>>>> On 17/11/2017, 5:52 PM, "abdullah alamoudi" <
> [email protected]>
> >>>>>>>> wrote:
> >>>>>>>>>>>
> >>>>>>>>>>> Can you illustrate how a deadlock can happen? I am anxious to
> >> know.
> >>>>>>>>>>> Moreover, the reason for the multiple transaction ids in feeds
> is
> >>>>>>>> not
> >>>>>>>>> simply because we compile them differently.
> >>>>>>>>>>>
> >>>>>>>>>>> How would a commit operator know which dataset active operation
> >>>>>>>>> counter to decrement if they share the same id for example?
> >>>>>>>>>>>
> >>>>>>>>>>>> On Nov 16, 2017, at 9:46 PM, Xikui Wang <[email protected]>
> wrote:
> >>>>>>>>>>>>
> >>>>>>>>>>>> Yes. That deadlock could happen. Currently, we have one-to-one
> >>>>>>>>> mappings for
> >>>>>>>>>>>> the jobs and transactions, except for the feeds.
> >>>>>>>>>>>>
> >>>>>>>>>>>> @Abdullah, after some digging into the code, I think probably
> we
> >>>> can
> >>>>>>>>> use a
> >>>>>>>>>>>> single transaction id for the job which feeds multiple
> datasets?
> >>>> See
> >>>>>>>>> if I
> >>>>>>>>>>>> can convince you. :)
> >>>>>>>>>>>>
> >>>>>>>>>>>> The reason we have multiple transaction ids in feeds is that
> we
> >>>>>>>> compile
> >>>>>>>>>>>> each connection job separately and combine them into a single
> >> feed
> >>>>>>>>> job. A
> >>>>>>>>>>>> new transaction id is created and assigned to each connection
> >> job,
> >>>>>>>>> thus for
> >>>>>>>>>>>> the combined job, we have to handle the different transactions
> >> as
> >>>>>>>> they
> >>>>>>>>>>>> are embedded in the connection job specifications. But, what
> if
> >> we
> >>>>>>>>> create a
> >>>>>>>>>>>> single transaction id for the combined job? That transaction
> id
> >>>> will
> >>>>>>>> be
> >>>>>>>>>>>> embedded into each connection so they can write logs freely,
> but
> >>>> the
> >>>>>>>>>>>> transaction will be started and committed only once as there
> is
> >>>> only
> >>>>>>>>> one
> >>>>>>>>>>>> feed job. In this way, we won't need
> >>>> multiTransactionJobletEventLis
> >>>>>>>>> tener
> >>>>>>>>>>>> and the transaction id can be removed from the job
> specification
> >>>>>>>>> easily as
> >>>>>>>>>>>> well (for Steven's change).
> >>>>>>>>>>>>
> >>>>>>>>>>>> Best,
> >>>>>>>>>>>> Xikui
> >>>>>>>>>>>>
> >>>>>>>>>>>>
> >>>>>>>>>>>>> On Thu, Nov 16, 2017 at 4:26 PM, Mike Carey <
> [email protected]
> >>>
> >>>>>>>>> wrote:
> >>>>>>>>>>>>>
> >>>>>>>>>>>>> I worry about deadlocks.  The waits for graph may not
> >> understand
> >>>>>>>> that
> >>>>>>>>>>>>> making t1 wait will also make t2 wait since they may share a
> >>>> thread
> >>>>>>>> -
> >>>>>>>>>>>>> right?  Or do we have jobs and transactions separately
> >>>> represented
> >>>>>>>>> there
> >>>>>>>>>>>>> now?
> >>>>>>>>>>>>>
> >>>>>>>>>>>>>> On Nov 16, 2017 3:10 PM, "abdullah alamoudi" <
> >>>> [email protected]>
> >>>>>>>>> wrote:
> >>>>>>>>>>>>>>
> >>>>>>>>>>>>>> We are using multiple transactions in a single job in case
> of
> >>>> feed
> >>>>>>>>> and I
> >>>>>>>>>>>>>> think that this is the correct way.
> >>>>>>>>>>>>>> Having a single job for a feed that feeds into multiple
> >> datasets
> >>>>>>>> is a
> >>>>>>>>>>>>> good
> >>>>>>>>>>>>>> thing since job resources/feed resources are consolidated.
> >>>>>>>>>>>>>>
> >>>>>>>>>>>>>> Here are some points:
> >>>>>>>>>>>>>> - We can't use the same transaction id to feed multiple
> >>>> datasets.
> >>>>>>>> The
> >>>>>>>>>>>>> only
> >>>>>>>>>>>>>> other option is to have multiple jobs each feeding a
> different
> >>>>>>>>> dataset.
> >>>>>>>>>>>>>> - Having multiple jobs (in addition to the extra resources
> >> used,
> >>>>>>>>> memory
> >>>>>>>>>>>>>> and CPU) would then forces us to either read data from
> >> external
> >>>>>>>>> sources
> >>>>>>>>>>>>>> multiple times, parse records multiple times, etc
> >>>>>>>>>>>>>> or having to have a synchronization between the different
> jobs
> >>>> and
> >>>>>>>>> the
> >>>>>>>>>>>>>> feed source within asterixdb. IMO, this is far more
> >> complicated
> >>>>>>>> than
> >>>>>>>>>>>>> having
> >>>>>>>>>>>>>> multiple transactions within a single job and the cost far
> >>>>>> outweigh
> >>>>>>>>> the
> >>>>>>>>>>>>>> benefits.
> >>>>>>>>>>>>>>
> >>>>>>>>>>>>>> P.S,
> >>>>>>>>>>>>>> We are also using this for bucket connections in Couchbase
> >>>>>>>> Analytics.
> >>>>>>>>>>>>>>
> >>>>>>>>>>>>>>> On Nov 16, 2017, at 2:57 PM, Till Westmann <
> [email protected]
> >>>
> >>>>>>>>> wrote:
> >>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>> If there are a number of issue with supporting multiple
> >>>>>>>> transaction
> >>>>>>>>> ids
> >>>>>>>>>>>>>>> and no clear benefits/use-cases, I’d vote for
> simplification
> >> :)
> >>>>>>>>>>>>>>> Also, code that’s not being used has a tendency to "rot"
> and
> >>>> so I
> >>>>>>>>> think
> >>>>>>>>>>>>>>> that it’s usefulness might be limited by the time we’d
> find a
> >>>> use
> >>>>>>>>> for
> >>>>>>>>>>>>>>> this functionality.
> >>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>> My 2c,
> >>>>>>>>>>>>>>> Till
> >>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>> On 16 Nov 2017, at 13:57, Xikui Wang wrote:
> >>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>> I'm separating the connections into different jobs in some
> >> of
> >>>> my
> >>>>>>>>>>>>>>>> experiments... but that was intended to be used for the
> >>>>>>>>> experimental
> >>>>>>>>>>>>>>>> settings (i.e., not for master now)...
> >>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>> I think the interesting question here is whether we want
> to
> >>>>>> allow
> >>>>>>>>> one
> >>>>>>>>>>>>>>>> Hyracks job to carry multiple transactions. I personally
> >> think
> >>>>>>>> that
> >>>>>>>>>>>>>> should
> >>>>>>>>>>>>>>>> be allowed as the transaction and job are two separate
> >>>> concepts,
> >>>>>>>>> but I
> >>>>>>>>>>>>>>>> couldn't find such use cases other than the feeds. Does
> >> anyone
> >>>>>>>>> have a
> >>>>>>>>>>>>>> good
> >>>>>>>>>>>>>>>> example on this?
> >>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>> Another question is, if we do allow multiple transactions
> >> in a
> >>>>>>>>> single
> >>>>>>>>>>>>>>>> Hyracks job, how do we enable commit runtime to obtain the
> >>>>>>>> correct
> >>>>>>>>> TXN
> >>>>>>>>>>>>>> id
> >>>>>>>>>>>>>>>> without having that embedded as part of the job
> >> specification.
> >>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>> Best,
> >>>>>>>>>>>>>>>> Xikui
> >>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>> On Thu, Nov 16, 2017 at 1:01 PM, abdullah alamoudi <
> >>>>>>>>>>>>> [email protected]>
> >>>>>>>>>>>>>>>> wrote:
> >>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>> I am curious as to how feed will work without this?
> >>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>> ~Abdullah.
> >>>>>>>>>>>>>>>>>> On Nov 16, 2017, at 12:43 PM, Steven Jacobs <
> >>>> [email protected]
> >>>>>>>
> >>>>>>>>>>>>> wrote:
> >>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>> Hi all,
> >>>>>>>>>>>>>>>>>> We currently have MultiTransactionJobletEventLis
> >>>> tenerFactory,
> >>>>>>>>> which
> >>>>>>>>>>>>>>>>> allows
> >>>>>>>>>>>>>>>>>> for one Hyracks job to run multiple Asterix transactions
> >>>>>>>>> together.
> >>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>> This class is only used by feeds, and feeds are in
> process
> >>>> of
> >>>>>>>>>>>>>> changing to
> >>>>>>>>>>>>>>>>>> no longer need this feature. As part of the work in
> >>>>>>>> pre-deploying
> >>>>>>>>>>>>> job
> >>>>>>>>>>>>>>>>>> specifications to be used by multiple hyracks jobs, I've
> >>>> been
> >>>>>>>>>>>>> working
> >>>>>>>>>>>>>> on
> >>>>>>>>>>>>>>>>>> removing the transaction id from the job specifications,
> >> as
> >>>> we
> >>>>>>>>> use a
> >>>>>>>>>>>>>> new
> >>>>>>>>>>>>>>>>>> transaction for each invocation of a deployed job.
> >>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>> There is currently no clear way to remove the
> transaction
> >> id
> >>>>>>>> from
> >>>>>>>>>>>>> the
> >>>>>>>>>>>>>> job
> >>>>>>>>>>>>>>>>>> spec and keep the option for
> >> MultiTransactionJobletEventLis
> >>>>>>>>>>>>>> tenerFactory.
> >>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>> The question for the group is, do we see a need to
> >> maintain
> >>>>>>>> this
> >>>>>>>>>>>>> class
> >>>>>>>>>>>>>>>>> that
> >>>>>>>>>>>>>>>>>> will no longer be used by any current code? Or, an other
> >>>>>> words,
> >>>>>>>>> is
> >>>>>>>>>>>>>> there
> >>>>>>>>>>>>>>>>> a
> >>>>>>>>>>>>>>>>>> strong possibility that in the future we will want
> >> multiple
> >>>>>>>>>>>>>> transactions
> >>>>>>>>>>>>>>>>> to
> >>>>>>>>>>>>>>>>>> share a single Hyracks job, meaning that it is worth
> >>>> figuring
> >>>>>>>> out
> >>>>>>>>>>>>> how
> >>>>>>>>>>>>>> to
> >>>>>>>>>>>>>>>>>> maintain this class?
> >>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>> Steven
> >>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>
> >>>>>>>>>>>>>>
> >>>>>>>>>>>>>
> >>>>>>>>>>>
> >>>>>>>>>>>
> >>>>>>>>>>>
> >>>>>>>>>>>
> >>>>>>>>>>
> >>>>>>>>>
> >>>>>>>>
> >>>>>>
> >>>>>>
> >>>>
> >>>>
> >>
> >>
>
>

Re: MultiTransactionJobletEventListenerFactory

Reply via email to