Re: MultiTransactionJobletEventListenerFactory

abdullah alamoudi Fri, 17 Nov 2017 11:26:04 -0800

I think that this works and seems like the question is how different operators 
in the job can get their transaction ids.


~Abdullah.

> On Nov 17, 2017, at 11:21 AM, Steven Jacobs <[email protected]> wrote:
> 
> From the conversation, it seems like nobody has the full picture to propose
> the design?
> For deployed jobs, the idea is to use the same job specification but create
> a new Hyracks job and Asterix Transaction for each execution.
> 
> Steven
> 
> On Fri, Nov 17, 2017 at 11:10 AM, abdullah alamoudi <[email protected]>
> wrote:
> 
>> I can e-meet anytime (moved to Sunnyvale). We can also look at a proposed
>> design and see if it can work
>> Back to my question, how were you planning to change the transaction id if
>> we forget about the case with multiple datasets (feed job)?
>> 
>> 
>>> On Nov 17, 2017, at 10:38 AM, Steven Jacobs <[email protected]> wrote:
>>> 
>>> Maybe it would be good to have a meeting about this with all interested
>>> parties?
>>> 
>>> I can be on-campus at UCI on Tuesday if that would be a good day to meet.
>>> 
>>> Steven
>>> 
>>> On Fri, Nov 17, 2017 at 9:36 AM, abdullah alamoudi <[email protected]>
>>> wrote:
>>> 
>>>> Also, was wondering how would you do the same for a single dataset
>>>> (non-feed). How would you get the transaction id and change it when you
>>>> re-run?
>>>> 
>>>> On Nov 17, 2017 7:12 AM, "Murtadha Hubail" <[email protected]> wrote:
>>>> 
>>>>> For atomic transactions, the change was merged yesterday. For entity
>>>> level
>>>>> transactions, it should be a very small change.
>>>>> 
>>>>> Cheers,
>>>>> Murtadha
>>>>> 
>>>>>> On Nov 17, 2017, at 6:07 PM, abdullah alamoudi <[email protected]>
>>>>> wrote:
>>>>>> 
>>>>>> I understand that is not the case right now but what you're working
>> on?
>>>>>> 
>>>>>> Cheers,
>>>>>> Abdullah.
>>>>>> 
>>>>>> 
>>>>>>> On Nov 17, 2017, at 7:04 AM, Murtadha Hubail <[email protected]>
>>>>> wrote:
>>>>>>> 
>>>>>>> A transaction context can register multiple primary indexes.
>>>>>>> Since each entity commit log contains the dataset id, you can
>>>> decrement
>>>>> the active operations on
>>>>>>> the operation tracker associated with that dataset id.
>>>>>>> 
>>>>>>> On 17/11/2017, 5:52 PM, "abdullah alamoudi" <[email protected]>
>>>> wrote:
>>>>>>> 
>>>>>>> Can you illustrate how a deadlock can happen? I am anxious to know.
>>>>>>> Moreover, the reason for the multiple transaction ids in feeds is
>>>> not
>>>>> simply because we compile them differently.
>>>>>>> 
>>>>>>> How would a commit operator know which dataset active operation
>>>>> counter to decrement if they share the same id for example?
>>>>>>> 
>>>>>>>> On Nov 16, 2017, at 9:46 PM, Xikui Wang <[email protected]> wrote:
>>>>>>>> 
>>>>>>>> Yes. That deadlock could happen. Currently, we have one-to-one
>>>>> mappings for
>>>>>>>> the jobs and transactions, except for the feeds.
>>>>>>>> 
>>>>>>>> @Abdullah, after some digging into the code, I think probably we can
>>>>> use a
>>>>>>>> single transaction id for the job which feeds multiple datasets? See
>>>>> if I
>>>>>>>> can convince you. :)
>>>>>>>> 
>>>>>>>> The reason we have multiple transaction ids in feeds is that we
>>>> compile
>>>>>>>> each connection job separately and combine them into a single feed
>>>>> job. A
>>>>>>>> new transaction id is created and assigned to each connection job,
>>>>> thus for
>>>>>>>> the combined job, we have to handle the different transactions as
>>>> they
>>>>>>>> are embedded in the connection job specifications. But, what if we
>>>>> create a
>>>>>>>> single transaction id for the combined job? That transaction id will
>>>> be
>>>>>>>> embedded into each connection so they can write logs freely, but the
>>>>>>>> transaction will be started and committed only once as there is only
>>>>> one
>>>>>>>> feed job. In this way, we won't need multiTransactionJobletEventLis
>>>>> tener
>>>>>>>> and the transaction id can be removed from the job specification
>>>>> easily as
>>>>>>>> well (for Steven's change).
>>>>>>>> 
>>>>>>>> Best,
>>>>>>>> Xikui
>>>>>>>> 
>>>>>>>> 
>>>>>>>>> On Thu, Nov 16, 2017 at 4:26 PM, Mike Carey <[email protected]>
>>>>> wrote:
>>>>>>>>> 
>>>>>>>>> I worry about deadlocks.  The waits for graph may not understand
>>>> that
>>>>>>>>> making t1 wait will also make t2 wait since they may share a thread
>>>> -
>>>>>>>>> right?  Or do we have jobs and transactions separately represented
>>>>> there
>>>>>>>>> now?
>>>>>>>>> 
>>>>>>>>>> On Nov 16, 2017 3:10 PM, "abdullah alamoudi" <[email protected]>
>>>>> wrote:
>>>>>>>>>> 
>>>>>>>>>> We are using multiple transactions in a single job in case of feed
>>>>> and I
>>>>>>>>>> think that this is the correct way.
>>>>>>>>>> Having a single job for a feed that feeds into multiple datasets
>>>> is a
>>>>>>>>> good
>>>>>>>>>> thing since job resources/feed resources are consolidated.
>>>>>>>>>> 
>>>>>>>>>> Here are some points:
>>>>>>>>>> - We can't use the same transaction id to feed multiple datasets.
>>>> The
>>>>>>>>> only
>>>>>>>>>> other option is to have multiple jobs each feeding a different
>>>>> dataset.
>>>>>>>>>> - Having multiple jobs (in addition to the extra resources used,
>>>>> memory
>>>>>>>>>> and CPU) would then forces us to either read data from external
>>>>> sources
>>>>>>>>>> multiple times, parse records multiple times, etc
>>>>>>>>>> or having to have a synchronization between the different jobs and
>>>>> the
>>>>>>>>>> feed source within asterixdb. IMO, this is far more complicated
>>>> than
>>>>>>>>> having
>>>>>>>>>> multiple transactions within a single job and the cost far
>> outweigh
>>>>> the
>>>>>>>>>> benefits.
>>>>>>>>>> 
>>>>>>>>>> P.S,
>>>>>>>>>> We are also using this for bucket connections in Couchbase
>>>> Analytics.
>>>>>>>>>> 
>>>>>>>>>>> On Nov 16, 2017, at 2:57 PM, Till Westmann <[email protected]>
>>>>> wrote:
>>>>>>>>>>> 
>>>>>>>>>>> If there are a number of issue with supporting multiple
>>>> transaction
>>>>> ids
>>>>>>>>>>> and no clear benefits/use-cases, I’d vote for simplification :)
>>>>>>>>>>> Also, code that’s not being used has a tendency to "rot" and so I
>>>>> think
>>>>>>>>>>> that it’s usefulness might be limited by the time we’d find a use
>>>>> for
>>>>>>>>>>> this functionality.
>>>>>>>>>>> 
>>>>>>>>>>> My 2c,
>>>>>>>>>>> Till
>>>>>>>>>>> 
>>>>>>>>>>>> On 16 Nov 2017, at 13:57, Xikui Wang wrote:
>>>>>>>>>>>> 
>>>>>>>>>>>> I'm separating the connections into different jobs in some of my
>>>>>>>>>>>> experiments... but that was intended to be used for the
>>>>> experimental
>>>>>>>>>>>> settings (i.e., not for master now)...
>>>>>>>>>>>> 
>>>>>>>>>>>> I think the interesting question here is whether we want to
>> allow
>>>>> one
>>>>>>>>>>>> Hyracks job to carry multiple transactions. I personally think
>>>> that
>>>>>>>>>> should
>>>>>>>>>>>> be allowed as the transaction and job are two separate concepts,
>>>>> but I
>>>>>>>>>>>> couldn't find such use cases other than the feeds. Does anyone
>>>>> have a
>>>>>>>>>> good
>>>>>>>>>>>> example on this?
>>>>>>>>>>>> 
>>>>>>>>>>>> Another question is, if we do allow multiple transactions in a
>>>>> single
>>>>>>>>>>>> Hyracks job, how do we enable commit runtime to obtain the
>>>> correct
>>>>> TXN
>>>>>>>>>> id
>>>>>>>>>>>> without having that embedded as part of the job specification.
>>>>>>>>>>>> 
>>>>>>>>>>>> Best,
>>>>>>>>>>>> Xikui
>>>>>>>>>>>> 
>>>>>>>>>>>> On Thu, Nov 16, 2017 at 1:01 PM, abdullah alamoudi <
>>>>>>>>> [email protected]>
>>>>>>>>>>>> wrote:
>>>>>>>>>>>> 
>>>>>>>>>>>>> I am curious as to how feed will work without this?
>>>>>>>>>>>>> 
>>>>>>>>>>>>> ~Abdullah.
>>>>>>>>>>>>>> On Nov 16, 2017, at 12:43 PM, Steven Jacobs <[email protected]
>>> 
>>>>>>>>> wrote:
>>>>>>>>>>>>>> 
>>>>>>>>>>>>>> Hi all,
>>>>>>>>>>>>>> We currently have MultiTransactionJobletEventListenerFactory,
>>>>> which
>>>>>>>>>>>>> allows
>>>>>>>>>>>>>> for one Hyracks job to run multiple Asterix transactions
>>>>> together.
>>>>>>>>>>>>>> 
>>>>>>>>>>>>>> This class is only used by feeds, and feeds are in process of
>>>>>>>>>> changing to
>>>>>>>>>>>>>> no longer need this feature. As part of the work in
>>>> pre-deploying
>>>>>>>>> job
>>>>>>>>>>>>>> specifications to be used by multiple hyracks jobs, I've been
>>>>>>>>> working
>>>>>>>>>> on
>>>>>>>>>>>>>> removing the transaction id from the job specifications, as we
>>>>> use a
>>>>>>>>>> new
>>>>>>>>>>>>>> transaction for each invocation of a deployed job.
>>>>>>>>>>>>>> 
>>>>>>>>>>>>>> There is currently no clear way to remove the transaction id
>>>> from
>>>>>>>>> the
>>>>>>>>>> job
>>>>>>>>>>>>>> spec and keep the option for MultiTransactionJobletEventLis
>>>>>>>>>> tenerFactory.
>>>>>>>>>>>>>> 
>>>>>>>>>>>>>> The question for the group is, do we see a need to maintain
>>>> this
>>>>>>>>> class
>>>>>>>>>>>>> that
>>>>>>>>>>>>>> will no longer be used by any current code? Or, an other
>> words,
>>>>> is
>>>>>>>>>> there
>>>>>>>>>>>>> a
>>>>>>>>>>>>>> strong possibility that in the future we will want multiple
>>>>>>>>>> transactions
>>>>>>>>>>>>> to
>>>>>>>>>>>>>> share a single Hyracks job, meaning that it is worth figuring
>>>> out
>>>>>>>>> how
>>>>>>>>>> to
>>>>>>>>>>>>>> maintain this class?
>>>>>>>>>>>>>> 
>>>>>>>>>>>>>> Steven
>>>>>>>>>>>>> 
>>>>>>>>>>>>> 
>>>>>>>>>> 
>>>>>>>>>> 
>>>>>>>>> 
>>>>>>> 
>>>>>>> 
>>>>>>> 
>>>>>>> 
>>>>>> 
>>>>> 
>>>> 
>> 
>>

Re: MultiTransactionJobletEventListenerFactory

Reply via email to