Re: MultiTransactionJobletEventListenerFactory

abdullah alamoudi Fri, 17 Nov 2017 11:56:24 -0800
Right now, they can't, so datasetId can be safely used.
> On Nov 17, 2017, at 11:51 AM, Steven Jacobs <[email protected]> wrote:
> 
> For option 1, I think the dataset id is not a unique identifier. Couldn't
> multiple transactions in one job work on the same dataset?
> 
> Steven
> 
> On Fri, Nov 17, 2017 at 11:38 AM, abdullah alamoudi <[email protected]>
> wrote:
> 
>> So, there are three options to do this:
>> 1. Each of these operators work on a a specific dataset. So we can pass
>> the datasetId to the JobEventListenerFactory when requesting the
>> transaction id.
>> 2. We make 1 transaction works for multiple datasets by using a map from
>> datasetId to primary opTracker and use it when reporting commits by the log
>> flusher thread.
>> 3. Prevent a job from having multiple transactions. (For the record, I
>> dislike this option since the price we pay is very high IMO)
>> 
>> Cheers,
>> Abdullah.
>> 
>>> On Nov 17, 2017, at 11:32 AM, Steven Jacobs <[email protected]> wrote:
>>> 
>>> Well, we've solved the problem when there is only one transaction id per
>>> job. The operators can fetch the transaction ids from the
>>> JobEventListenerFactory (you can find this in master now). The issue is,
>>> when we are trying to combine multiple job specs into one feed job, the
>>> operators at runtime don't have a memory of which "job spec" they
>>> originally belonged to which could tell them which one of the transaction
>>> ids that they should use.
>>> 
>>> Steven
>>> 
>>> On Fri, Nov 17, 2017 at 11:25 AM, abdullah alamoudi <[email protected]>
>>> wrote:
>>> 
>>>> 
>>>> I think that this works and seems like the question is how different
>>>> operators in the job can get their transaction ids.
>>>> 
>>>> ~Abdullah.
>>>> 
>>>>> On Nov 17, 2017, at 11:21 AM, Steven Jacobs <[email protected]> wrote:
>>>>> 
>>>>> From the conversation, it seems like nobody has the full picture to
>>>> propose
>>>>> the design?
>>>>> For deployed jobs, the idea is to use the same job specification but
>>>> create
>>>>> a new Hyracks job and Asterix Transaction for each execution.
>>>>> 
>>>>> Steven
>>>>> 
>>>>> On Fri, Nov 17, 2017 at 11:10 AM, abdullah alamoudi <
>> [email protected]>
>>>>> wrote:
>>>>> 
>>>>>> I can e-meet anytime (moved to Sunnyvale). We can also look at a
>>>> proposed
>>>>>> design and see if it can work
>>>>>> Back to my question, how were you planning to change the transaction
>> id
>>>> if
>>>>>> we forget about the case with multiple datasets (feed job)?
>>>>>> 
>>>>>> 
>>>>>>> On Nov 17, 2017, at 10:38 AM, Steven Jacobs <[email protected]>
>> wrote:
>>>>>>> 
>>>>>>> Maybe it would be good to have a meeting about this with all
>> interested
>>>>>>> parties?
>>>>>>> 
>>>>>>> I can be on-campus at UCI on Tuesday if that would be a good day to
>>>> meet.
>>>>>>> 
>>>>>>> Steven
>>>>>>> 
>>>>>>> On Fri, Nov 17, 2017 at 9:36 AM, abdullah alamoudi <
>> [email protected]
>>>>> 
>>>>>>> wrote:
>>>>>>> 
>>>>>>>> Also, was wondering how would you do the same for a single dataset
>>>>>>>> (non-feed). How would you get the transaction id and change it when
>>>> you
>>>>>>>> re-run?
>>>>>>>> 
>>>>>>>> On Nov 17, 2017 7:12 AM, "Murtadha Hubail" <[email protected]>
>>>> wrote:
>>>>>>>> 
>>>>>>>>> For atomic transactions, the change was merged yesterday. For
>> entity
>>>>>>>> level
>>>>>>>>> transactions, it should be a very small change.
>>>>>>>>> 
>>>>>>>>> Cheers,
>>>>>>>>> Murtadha
>>>>>>>>> 
>>>>>>>>>> On Nov 17, 2017, at 6:07 PM, abdullah alamoudi <
>> [email protected]>
>>>>>>>>> wrote:
>>>>>>>>>> 
>>>>>>>>>> I understand that is not the case right now but what you're
>> working
>>>>>> on?
>>>>>>>>>> 
>>>>>>>>>> Cheers,
>>>>>>>>>> Abdullah.
>>>>>>>>>> 
>>>>>>>>>> 
>>>>>>>>>>> On Nov 17, 2017, at 7:04 AM, Murtadha Hubail <
>> [email protected]>
>>>>>>>>> wrote:
>>>>>>>>>>> 
>>>>>>>>>>> A transaction context can register multiple primary indexes.
>>>>>>>>>>> Since each entity commit log contains the dataset id, you can
>>>>>>>> decrement
>>>>>>>>> the active operations on
>>>>>>>>>>> the operation tracker associated with that dataset id.
>>>>>>>>>>> 
>>>>>>>>>>> On 17/11/2017, 5:52 PM, "abdullah alamoudi" <[email protected]>
>>>>>>>> wrote:
>>>>>>>>>>> 
>>>>>>>>>>> Can you illustrate how a deadlock can happen? I am anxious to
>> know.
>>>>>>>>>>> Moreover, the reason for the multiple transaction ids in feeds is
>>>>>>>> not
>>>>>>>>> simply because we compile them differently.
>>>>>>>>>>> 
>>>>>>>>>>> How would a commit operator know which dataset active operation
>>>>>>>>> counter to decrement if they share the same id for example?
>>>>>>>>>>> 
>>>>>>>>>>>> On Nov 16, 2017, at 9:46 PM, Xikui Wang <[email protected]> wrote:
>>>>>>>>>>>> 
>>>>>>>>>>>> Yes. That deadlock could happen. Currently, we have one-to-one
>>>>>>>>> mappings for
>>>>>>>>>>>> the jobs and transactions, except for the feeds.
>>>>>>>>>>>> 
>>>>>>>>>>>> @Abdullah, after some digging into the code, I think probably we
>>>> can
>>>>>>>>> use a
>>>>>>>>>>>> single transaction id for the job which feeds multiple datasets?
>>>> See
>>>>>>>>> if I
>>>>>>>>>>>> can convince you. :)
>>>>>>>>>>>> 
>>>>>>>>>>>> The reason we have multiple transaction ids in feeds is that we
>>>>>>>> compile
>>>>>>>>>>>> each connection job separately and combine them into a single
>> feed
>>>>>>>>> job. A
>>>>>>>>>>>> new transaction id is created and assigned to each connection
>> job,
>>>>>>>>> thus for
>>>>>>>>>>>> the combined job, we have to handle the different transactions
>> as
>>>>>>>> they
>>>>>>>>>>>> are embedded in the connection job specifications. But, what if
>> we
>>>>>>>>> create a
>>>>>>>>>>>> single transaction id for the combined job? That transaction id
>>>> will
>>>>>>>> be
>>>>>>>>>>>> embedded into each connection so they can write logs freely, but
>>>> the
>>>>>>>>>>>> transaction will be started and committed only once as there is
>>>> only
>>>>>>>>> one
>>>>>>>>>>>> feed job. In this way, we won't need
>>>> multiTransactionJobletEventLis
>>>>>>>>> tener
>>>>>>>>>>>> and the transaction id can be removed from the job specification
>>>>>>>>> easily as
>>>>>>>>>>>> well (for Steven's change).
>>>>>>>>>>>> 
>>>>>>>>>>>> Best,
>>>>>>>>>>>> Xikui
>>>>>>>>>>>> 
>>>>>>>>>>>> 
>>>>>>>>>>>>> On Thu, Nov 16, 2017 at 4:26 PM, Mike Carey <[email protected]
>>> 
>>>>>>>>> wrote:
>>>>>>>>>>>>> 
>>>>>>>>>>>>> I worry about deadlocks.  The waits for graph may not
>> understand
>>>>>>>> that
>>>>>>>>>>>>> making t1 wait will also make t2 wait since they may share a
>>>> thread
>>>>>>>> -
>>>>>>>>>>>>> right?  Or do we have jobs and transactions separately
>>>> represented
>>>>>>>>> there
>>>>>>>>>>>>> now?
>>>>>>>>>>>>> 
>>>>>>>>>>>>>> On Nov 16, 2017 3:10 PM, "abdullah alamoudi" <
>>>> [email protected]>
>>>>>>>>> wrote:
>>>>>>>>>>>>>> 
>>>>>>>>>>>>>> We are using multiple transactions in a single job in case of
>>>> feed
>>>>>>>>> and I
>>>>>>>>>>>>>> think that this is the correct way.
>>>>>>>>>>>>>> Having a single job for a feed that feeds into multiple
>> datasets
>>>>>>>> is a
>>>>>>>>>>>>> good
>>>>>>>>>>>>>> thing since job resources/feed resources are consolidated.
>>>>>>>>>>>>>> 
>>>>>>>>>>>>>> Here are some points:
>>>>>>>>>>>>>> - We can't use the same transaction id to feed multiple
>>>> datasets.
>>>>>>>> The
>>>>>>>>>>>>> only
>>>>>>>>>>>>>> other option is to have multiple jobs each feeding a different
>>>>>>>>> dataset.
>>>>>>>>>>>>>> - Having multiple jobs (in addition to the extra resources
>> used,
>>>>>>>>> memory
>>>>>>>>>>>>>> and CPU) would then forces us to either read data from
>> external
>>>>>>>>> sources
>>>>>>>>>>>>>> multiple times, parse records multiple times, etc
>>>>>>>>>>>>>> or having to have a synchronization between the different jobs
>>>> and
>>>>>>>>> the
>>>>>>>>>>>>>> feed source within asterixdb. IMO, this is far more
>> complicated
>>>>>>>> than
>>>>>>>>>>>>> having
>>>>>>>>>>>>>> multiple transactions within a single job and the cost far
>>>>>> outweigh
>>>>>>>>> the
>>>>>>>>>>>>>> benefits.
>>>>>>>>>>>>>> 
>>>>>>>>>>>>>> P.S,
>>>>>>>>>>>>>> We are also using this for bucket connections in Couchbase
>>>>>>>> Analytics.
>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>> On Nov 16, 2017, at 2:57 PM, Till Westmann <[email protected]
>>> 
>>>>>>>>> wrote:
>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>> If there are a number of issue with supporting multiple
>>>>>>>> transaction
>>>>>>>>> ids
>>>>>>>>>>>>>>> and no clear benefits/use-cases, I’d vote for simplification
>> :)
>>>>>>>>>>>>>>> Also, code that’s not being used has a tendency to "rot" and
>>>> so I
>>>>>>>>> think
>>>>>>>>>>>>>>> that it’s usefulness might be limited by the time we’d find a
>>>> use
>>>>>>>>> for
>>>>>>>>>>>>>>> this functionality.
>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>> My 2c,
>>>>>>>>>>>>>>> Till
>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>> On 16 Nov 2017, at 13:57, Xikui Wang wrote:
>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>> I'm separating the connections into different jobs in some
>> of
>>>> my
>>>>>>>>>>>>>>>> experiments... but that was intended to be used for the
>>>>>>>>> experimental
>>>>>>>>>>>>>>>> settings (i.e., not for master now)...
>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>> I think the interesting question here is whether we want to
>>>>>> allow
>>>>>>>>> one
>>>>>>>>>>>>>>>> Hyracks job to carry multiple transactions. I personally
>> think
>>>>>>>> that
>>>>>>>>>>>>>> should
>>>>>>>>>>>>>>>> be allowed as the transaction and job are two separate
>>>> concepts,
>>>>>>>>> but I
>>>>>>>>>>>>>>>> couldn't find such use cases other than the feeds. Does
>> anyone
>>>>>>>>> have a
>>>>>>>>>>>>>> good
>>>>>>>>>>>>>>>> example on this?
>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>> Another question is, if we do allow multiple transactions
>> in a
>>>>>>>>> single
>>>>>>>>>>>>>>>> Hyracks job, how do we enable commit runtime to obtain the
>>>>>>>> correct
>>>>>>>>> TXN
>>>>>>>>>>>>>> id
>>>>>>>>>>>>>>>> without having that embedded as part of the job
>> specification.
>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>> Best,
>>>>>>>>>>>>>>>> Xikui
>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>> On Thu, Nov 16, 2017 at 1:01 PM, abdullah alamoudi <
>>>>>>>>>>>>> [email protected]>
>>>>>>>>>>>>>>>> wrote:
>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>> I am curious as to how feed will work without this?
>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>> ~Abdullah.
>>>>>>>>>>>>>>>>>> On Nov 16, 2017, at 12:43 PM, Steven Jacobs <
>>>> [email protected]
>>>>>>> 
>>>>>>>>>>>>> wrote:
>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>> Hi all,
>>>>>>>>>>>>>>>>>> We currently have MultiTransactionJobletEventLis
>>>> tenerFactory,
>>>>>>>>> which
>>>>>>>>>>>>>>>>> allows
>>>>>>>>>>>>>>>>>> for one Hyracks job to run multiple Asterix transactions
>>>>>>>>> together.
>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>> This class is only used by feeds, and feeds are in process
>>>> of
>>>>>>>>>>>>>> changing to
>>>>>>>>>>>>>>>>>> no longer need this feature. As part of the work in
>>>>>>>> pre-deploying
>>>>>>>>>>>>> job
>>>>>>>>>>>>>>>>>> specifications to be used by multiple hyracks jobs, I've
>>>> been
>>>>>>>>>>>>> working
>>>>>>>>>>>>>> on
>>>>>>>>>>>>>>>>>> removing the transaction id from the job specifications,
>> as
>>>> we
>>>>>>>>> use a
>>>>>>>>>>>>>> new
>>>>>>>>>>>>>>>>>> transaction for each invocation of a deployed job.
>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>> There is currently no clear way to remove the transaction
>> id
>>>>>>>> from
>>>>>>>>>>>>> the
>>>>>>>>>>>>>> job
>>>>>>>>>>>>>>>>>> spec and keep the option for
>> MultiTransactionJobletEventLis
>>>>>>>>>>>>>> tenerFactory.
>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>> The question for the group is, do we see a need to
>> maintain
>>>>>>>> this
>>>>>>>>>>>>> class
>>>>>>>>>>>>>>>>> that
>>>>>>>>>>>>>>>>>> will no longer be used by any current code? Or, an other
>>>>>> words,
>>>>>>>>> is
>>>>>>>>>>>>>> there
>>>>>>>>>>>>>>>>> a
>>>>>>>>>>>>>>>>>> strong possibility that in the future we will want
>> multiple
>>>>>>>>>>>>>> transactions
>>>>>>>>>>>>>>>>> to
>>>>>>>>>>>>>>>>>> share a single Hyracks job, meaning that it is worth
>>>> figuring
>>>>>>>> out
>>>>>>>>>>>>> how
>>>>>>>>>>>>>> to
>>>>>>>>>>>>>>>>>> maintain this class?
>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>> Steven
>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>> 
>>>>>>>>>>>>>> 
>>>>>>>>>>>>> 
>>>>>>>>>>> 
>>>>>>>>>>> 
>>>>>>>>>>> 
>>>>>>>>>>> 
>>>>>>>>>> 
>>>>>>>>> 
>>>>>>>> 
>>>>>> 
>>>>>> 
>>>> 
>>>> 
>> 
>>
Re: MultiTransactionJobletEventListenerFactory

Reply via email to